This script uses the following packages:

# library(ggplot2)
# library(tidyr)
# library(lme4)
# library(lmerTest)
# library(nlme)
# library(knitr)
# library(kableExtra)
## 'data.frame':    206254 obs. of  11 variables:
##  $ Sub.Num             : int  501 501 501 501 501 501 501 501 501 501 ...
##  $ Tr.Num              : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ Order               : chr  "CN-1A" "CN-1A" "CN-1A" "CN-1A" ...
##  $ Condition           : chr  "pre-switch" "pre-switch" "pre-switch" "pre-switch" ...
##  $ Target              : chr  "slide" "slide" "slide" "slide" ...
##  $ Target.Side         : chr  "l" "l" "l" "l" ...
##  $ TimeC               : int  -3067 -3033 -3000 -2967 -2933 -2900 -2867 -2833 -2800 -2767 ...
##  $ Accuracy            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Block               : chr  "1A" "1A" "1A" "1A" ...
##  $ percentMissingFrames: num  10 10 10 10 10 10 10 10 10 10 ...
##  $ Source              : chr  "tobii" "tobii" "tobii" "tobii" ...
##   Sub.Num Tr.Num Order  Condition Target Target.Side TimeC Accuracy Block
## 1     501      2 CN-1A pre-switch  slide           l -3067       NA    1A
## 2     501      2 CN-1A pre-switch  slide           l -3033       NA    1A
## 3     501      2 CN-1A pre-switch  slide           l -3000       NA    1A
## 4     501      2 CN-1A pre-switch  slide           l -2967       NA    1A
## 5     501      2 CN-1A pre-switch  slide           l -2933       NA    1A
## 6     501      2 CN-1A pre-switch  slide           l -2900       NA    1A
##   percentMissingFrames Source
## 1                   10  tobii
## 2                   10  tobii
## 3                   10  tobii
## 4                   10  tobii
## 5                   10  tobii
## 6                   10  tobii

Recall that in this experiment, 5-year-old children completed a slightly altered version of the looking-while-listening (LWL) task. On each trial, children were shown pictures of two familiar objects and heard a sentence identifying one using either its color or name. Trials were blocked so that there were 8 trials using one dimension (pre-switch), 8 using the other dimension (post-switch).

In these analyses, children’s familiar word recognition will be quantified as the proportion of time spent fixating the target image out of the total time fixating either the target or distractor image after the target image was identified. Consistent with prior research using the looking-while-listening (LWL) paradigm, we will calculate this proportion (i.e., average accuracy) during a critical window 300 to 1,800 ms after the onset of the target word.

The window starts at 300, because it takes approximately this much time for children to program an eye movement (therefore any fixations between 0 and 300ms are unlikely to be stimulus-driven and were initiated before children began processing the target word). The window ends at 1,800, because children’s attention wanes over time and 1.5 seconds is a long time to stare at an object after it’s name (therefore changes in fixations after 1,800ms are also unlikely to be stimulus-drive as children’s attention wanders and they fixate elsewhere).

The Accuracy column in our data frame has a value of 1 for frames when children were fixating the target image, a value of 0 for frames when they were fixating the distractor image, and a value of NA for frames when they were fixating neither image.

This data set is a combination of tobii (downsampled from 60 to 30Hz) and handcoded data (at 30Hz). Therefore, for each trial we have a measure of children’s accuracy in fixating the target image every 33ms during our critical window: 300, 333, 367, 400, 433, 467, 500, 533, 567, 600, 633, 667, 700, 733, 767, 800, 833, 867, 900, 933, 967, 1000, 1033, 1067, 1100, 1133, 1167, 1200, 1233, 1267, 1300, 1333, 1367, 1400, 1433, 1467, 1500, 1533, 1567, 1600, 1633, 1667, 1700, 1733, 1767, 1800.

By calculating the average using mean() of these 46 frames for each trial, we will identify the proportion of frames where children are fixating the target (1’s) out of the frames fixating either the target or distractor (1’s & 0’s). Note that any frames where children are not fixating either image (NAs) will be dropped when calculating the average na.rm=TRUE. This is the standard in LWL/Visual World Paradigm (VWP) research, rather than scoring fixations outside either AOI as an accuracy of 0.

We have data for 57 subjects.

Aggregating

For our plots and analyses we’ll need to aggregate accuracy over 3 levels:

  • first, averaging across the frames (maximum of 46) for each trial and each participant
  • second, averaging across the trials (maximum of 8) for each condition and each participant
  • third, averaging across the participants (maximum of 57) for each condition

Let’s take this step by step and look at the output.

By Trial

Here’s the first step:

tidyr syntax:

filter() to only include rows for trials with sufficient data and rows for frames in our critical window
group_by() to keep the data at the subject- and trial-level and collapse across time (we include condition here, because otherwise this column would be dropped. alternatively, we could include condition in the summarise() step)
summarise() to calculate the average accuracy when collapsing across time (I have also included Frames for illustrative purposes, but it is not necessary)

byTrial = d %>% 
  filter(percentMissingFrames < 50,
    TimeC >= 300 & TimeC <=1800) %>% 
  group_by(Sub.Num,Condition,Tr.Num) %>% 
  summarise(
    Frames = sum(!is.na(Accuracy)),
    Accuracy = mean(Accuracy,na.rm=T))

Here is all of the data for subject 501:

Sub.Num Condition Tr.Num Frames Accuracy
501 pre-switch 2 42 0.98
501 pre-switch 3 46 1.00
501 pre-switch 4 46 1.00
501 pre-switch 6 46 1.00
501 pre-switch 7 45 0.82
501 pre-switch 8 42 0.88
501 pre-switch 9 26 1.00
501 post-switch 10 46 1.00
501 post-switch 11 37 0.59
501 post-switch 12 44 0.80
501 post-switch 13 45 0.82
501 post-switch 14 46 1.00
501 post-switch 15 30 0.20
501 post-switch 16 41 0.49
501 post-switch 17 46 0.93

Notice that we have 15 data points for subject 501 - 1 per trial. There should be 16 data points (8 trials in each of the 2 condition), but we are missing data for trial 5 in the pre-switch condition, because we removed trials where the child was fixating either image for less than 23 frames (50%) of the critical window.

Our data frame has columns indicating the number of frames the child spent fixating either the target or distractor image during the critical window for each trial (maximum of 46) and what proportion of those fixations (i.e., their average accuracy) were fixations to the target image. For subject 501, accuracies range from 0.2 to 1.

By Subject

The next step is to collapse across trials within each condition for each subject:

bySub = byTrial %>% 
  group_by(Sub.Num,Condition) %>% 
  summarise(
    Trials = sum(!is.na(Accuracy)),
    Accuracy = mean(Accuracy,na.rm=T))

Here is all of the data for subject 501:

Sub.Num Condition Trials Accuracy
501 pre-switch 7 0.95
501 post-switch 8 0.73

Notice that we now have only 2 data points for subject 501 (1 per condition). Looking at the Trials column, we see again that 1 trial was excluded for the pre-switch condition, but all trials were included in the post-switch condition. The accuracy column is now averaging across the previous averages (we calculating across frames within the critical window). So 501 had accuracies of 0.98, 1, 1, 1, 0.82, 0.88, 1 on the 7 trials in the pre-switch condition and the average of these 7 average accuracies is 0.95

By Group

The final step is to collapse across subjects within each condition:

For plotting purposes, we want to calculate several metrics about the average accuracy for each condition:

  1. Subjects is the number of subjects who had an average accuracy
  2. SD the SD of the average accuracies across subjects
  3. SE is calculated by using our previous two outputs: divding SD by the square-root of Subjects
  4. Accuracy is the average accuracy across subjects, note: its important that this variable match the variable name in the bySub data frame, because we will be using both for our plots (more on this in a second)
  5. lower will be used to plot the SE bars and uses two of our previous outputs: subtracting 1 SE from the mean Accuracy
  6. upper ditto to lower, but adding 1 SE
byGroup = bySub %>% 
  group_by(Condition) %>% 
  summarise(
    Subjects = sum(!is.na(Accuracy)),
    SD = sd(Accuracy,na.rm=TRUE),
    SE = SD/sqrt(Subjects),
    Accuracy = mean(Accuracy,na.rm=TRUE),
    lower=Accuracy-SE,
    upper=Accuracy+SE)

Here is all of the data:

Condition Subjects SD SE Accuracy lower upper
pre-switch 56 0.08 0.01 0.87 0.86 0.88
post-switch 56 0.09 0.01 0.83 0.82 0.84

Notice that we only have data for 56 participants in each condition, but in our full data set we had 57 participants. Subject 518 was dropped in the process. This is because they were missing too many frames during the critical window for every trial:

Condition Tr.Num percentMissingFrames
pre-switch 2 100.00
pre-switch 3 100.00
pre-switch 4 100.00
pre-switch 5 100.00
pre-switch 6 100.00
pre-switch 7 100.00
pre-switch 8 100.00
pre-switch 9 100.00
post-switch 10 100.00
post-switch 11 100.00
post-switch 12 100.00
post-switch 13 100.00
post-switch 14 100.00
post-switch 15 100.00
post-switch 16 100.00
post-switch 17 98.89

This is because the tobii did not track this child’s gaze well, but we were unable to handcode because the quality of the video recording during their session was too poor.

Plots

We’re going to be plotting children’s accuracy separately for the pre-switch and post-switch conditions. Condition in our dataframes right now is a character, so ggplot will default to order our x-axis alphabetically. But the alphabetical order (post-switch then pre-switch) is opposite for how they actually occured during the experiment. We can control the order in which our conditions occur by converting them into a factor with the correct ordering:

byTrial$Condition = factor(byTrial$Condition,c('pre-switch','post-switch'))
bySub$Condition = factor(bySub$Condition,c('pre-switch','post-switch'))
byGroup$Condition = factor(byGroup$Condition,c('pre-switch','post-switch'))

Bar Plot

Traditionally, we would plot children’s average accuracy using a bar plot.

Let’s use this plot as an opportunity to illustrate how ggplot is designed to be flexible in that you can provide it with minimal code and it will determine the appropriate graphical settings (e.g., width of the x- and y-axis).

Let’s start with the minimum:

ggplot syntax:

  1. we are giving the dataframe to use byGroup, and which columns to use for the x axis Condition, y Accuracy and color Condition
  2. geom_bar() to get a bar plot
  3. geom_errorbar() to add error bars and specify the columns used so that it corresponds to +/- 1 SE
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
  geom_bar(stat='identity')+
  geom_errorbar(aes(ymin=lower, ymax=upper))

Yuck! let’s make this better…

3a. make the error bars appropriately sized width=.5

ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
  geom_bar(stat='identity')+
  geom_errorbar(width=.5, aes(ymin=lower, ymax=upper))

  1. add a horizontal line geom_hline() to indicate chance performance (i.e., 50% which is equal time fixating target and distractor) and fix it’s aesthetics
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
  geom_bar(stat='identity')+
  geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) + 
  geom_hline(yintercept=.5,linetype='dashed',color='black')

  1. coord_cartesian() to change the axes so that our bar plot isn’t floating
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
  geom_bar(stat='identity')+
  geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) + 
  geom_hline(yintercept=.5,linetype='dashed',color='black') +
  coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F)

  1. scale_y_continuous() to change the breaks in the y-axis to occur more regularly (i.e., every 10%)
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
  geom_bar(stat='identity')+
  geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) + 
  geom_hline(yintercept=.5,linetype='dashed',color='black') +
  coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F) + 
  scale_y_continuous(breaks=seq(from=0,to=1,by=.1))

  1. labs() to make our axis labels more transparent
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
  geom_bar(stat='identity')+
  geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) + 
  geom_hline(yintercept=.5,linetype='dashed',color='black') +
  coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F) + 
  scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
  labs(x='Condition',y='Accuracy (300 to 1800ms)')

  1. scale_fill_manual() to choose better colors and use a conceptually coherent mapping (red = worse performance)
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
  geom_bar(stat='identity')+
  geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) + 
  geom_hline(yintercept=.5,linetype='dashed',color='black') +
  coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F) + 
  scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
  labs(x='Condition',y='Accuracy (300 to 1800ms)') +
  scale_fill_manual(values=c('dodgerblue','coral2'))

  1. theme_bw() to remove the gray background and increase the minimum font size
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
  geom_bar(stat='identity')+
  geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) + 
  geom_hline(yintercept=.5,linetype='dashed',color='black') +
  coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F) + 
  scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
  labs(x='Condition',y='Accuracy (300 to 1800ms)') +
  scale_fill_manual(values=c('dodgerblue','coral2')) +
  theme_bw(base_size=14)

  1. theme() to remove legend (which is redundant with our x-axis labels)
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
  geom_bar(stat='identity')+
  geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) + 
  geom_hline(yintercept=.5,linetype='dashed',color='black') +
  coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F) + 
  scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
  labs(x='Condition',y='Accuracy (300 to 1800ms)') +
  scale_fill_manual(values=c('dodgerblue','coral2')) +
  theme_bw(base_size=14) +
  theme(legend.position='none') 

  1. geom_point() to add in individual subjects data
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
  geom_bar(stat='identity')+
  geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) + 
  geom_point(data=bySub)+
  geom_hline(yintercept=.5,linetype='dashed',color='black') +
  coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F) + 
  scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
  labs(x='Condition',y='Accuracy (300 to 1800ms)') +
  scale_fill_manual(values=c('dodgerblue','coral2')) +
  theme_bw(base_size=14) +
  theme(legend.position='none') 

  1. fix geom_point() to add some horizontal jitter so we can more clearly see overlapping data points and increase the y-axis so points are not clipped
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
  geom_bar(stat='identity')+
  geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) + 
  geom_point(data=bySub, position=position_jitter(width = .05))+
  geom_hline(yintercept=.5,linetype='dashed',color='black') +
  coord_cartesian(ylim=c(0,1.05),xlim=c(.5,2.5),expand=F) + 
  scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
  labs(x='Condition',y='Accuracy (300 to 1800ms)') +
  scale_fill_manual(values=c('dodgerblue','coral2')) +
  theme_bw(base_size=14) +
  theme(legend.position='none') 

Violin plot

I have moved away from bar plots with jittered data points, because I don’t think they do a good job conveying the distribution across participants. I also think there’s some research showing biases that lead folks to misinterpret barplots. So, I’ve shifted to using violin plots instead.

This involves a few changes:

  1. use geom_violin() instead of geom_point() for the bySub data, move this to be plotted first, and make it slightly transparent alpha=.8
  2. use geom_point() instead of geom_bar() for the byGroup data and modify the asthetics
  3. reduce the width of the error bars
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
  geom_violin(data=bySub,alpha=.8)+
  geom_point(shape=16,fill='black',colour='black',size=2)+
  geom_errorbar(width=.2, aes(ymin=lower, ymax=upper)) + 
  geom_hline(yintercept=.5,linetype='dashed',color='black') +
  coord_cartesian(ylim=c(0,1.05),xlim=c(.5,2.5),expand=F) + 
  scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
  labs(x='Condition',y='Accuracy (300 to 1800ms)') +
  scale_fill_manual(values=c('dodgerblue','coral2')) +
  theme_bw(base_size=14) +
  theme(legend.position='none') 

Analyses

There are several different models we can use when analyzing the data:

  • keeping the data at the individual trial level & using linear mixed effects modeling
  • keeping the data at the subject level & using linear mixed effects modeling
  • keeping the data at the subject level with difference scores & using linear models

As far as I can tell, there is no established preference in the literature between the first two options. But here are some things to consider:

1. (pro mixed effect at subject level) linear models assume that the data are normally distributed, which means that the DVs you are trying to predict (either accuracies at the individual trial level or accuracies at the subject level) should be between 0.3 and 0.7 with only a few cases near the 0 and 1 boundaries. these extreme values in accuracy are more common at the individual trial level - it’s more likely to have an individual trial where the child looked at the target image for the entire 1.5 second window than it is to have an individual child who looked at the target image for the entire 1.5 second window on all 8 trials in the condition.

2. (pro mixed effect at the trial level) the linear mixed effects model at the subject level is discarding information, specifically inter-trial variability. for example, subject 1 and 2 may both have 50% accuracy averaged across the 8 trials in the post-switch condition, but subject 1 was consistent (50% accuracy on all 8 trials) while subject 2 was inconsistent (having 100% accuracy on 4 trials and 0% accuracy on the other 4 trials). removing this variance in the model at the subject level reduces the standard error, which will likely result in smaller p-values. moreover, the linear mixed effects model at the subject level is essentially operating under the assumption that each trial is a repeated sample from the same distribution and we can therefore average across them. if, however, we expect some trials to be more difficult than others (e.g., familiar words with later age of acquisition) then we really should keep the data at the individual trial level and consider including a random effect for items (more on this later)

3. (pro linear model) there is no transparent or agreed upon way to determine degrees of freedom and effect sizes for linear mixed effects models. therefore, we’ll be using approximations when conducting tests of significance (more on this later). generally, the different approximations will yield similar patterns of results, but not always (especially if you have marginal effects). as you’ll see, the linear models are clumsier to fit (we’ll be fitting a difference score rather than our actual accuracies) but in some cases - especially if you want to determine effect sizes for power analyses - they may be better.

Let’s walk through each of the different models.

Individual Trial Level

Before we fit our model let’s convert the independent variable (Condition) into a factor.

Factor syntax:

  1. we re-assign the values of Condition to itself after converting them to be a factor(). the order in which we specify each level in the factor c('pre-switch','post-switch') is the order in which they will appear in tables and plots (if we had left them as strings, R would otheriwse display them in alphabetical order). recall that we previously did this when making our plots, so we don’t actually need to run this first command again.
  2. we set the contrasts for each level so that cells where Condition is ‘pre-switch’ receive a value of 0 and cells where Condition is ‘post-switch’ receive a value of 1. this has several effects in our model results:
  • first, the beta coefficient for Condition tells us how much accuracy increases for a 1-point increase in Condition. with our contrasts specified in the order that they are, this is how much accuracy changes when moving from the pre-switch to the post-switch condition.
  • second, the beta coefficient affects lower order effects in our model. the beta coefficient for our intercept is the average accuracy where Condition = 0. with our contrasts specified as they are, the intercept will tell us the average accuracy for the pre-switch condition.
  1. we assign an attribute label to the contrasts as :pre because we centered our factor on the pre-switch condition
# byTrial$Condition = factor(byTrial$Condition,c('pre-switch','post-switch'))
contrasts(byTrial$Condition) = c(0,1)
colnames(attr(byTrial$Condition,"contrasts")) = ':pre'

lme4

Now we fit a linear mixed effects model using the lmer() function from the lme4 package

lmer() syntax:

we are assigning the output of the model to a new variable m

Accuracy ~ specifies the dependent variable that we are predicting
we want to subtract -.5 from accuracy so that the intercept will compare children’s performance against chance. Without this adjustment, the intercept would tell us whether children’s accuracy is significantly greater than 0. But we really care about whether children’s accuracy is significantly greater than 0.5

Condition is the fixed effect, the model will also include an intercept by default, if for some reason you do not want an intercept you would add 0 + Condition instead.
for the purposes of this tutorial we are not including other fixed effects (like children’s performance on the DCCS a measure of Executive Function), but if we did we could add that using + DCCS
if we wanted to include the interaction between between Condition and DCCS we could do that by additionally adding + Condition:DCCS or more simply we could specify the entire fixed effect structure using Condition*DCCS the * tells R to include all fixed effects and possible interactions

(1|Sub.Num) adds a random intercept for each participant
this is necessary to account for the non-independence when we have multiple data points per participant (e.g., a child who is very accurate on trial 1 will likely be very accurate on trial 2, because they have strong language skills)
there’s some advice to ‘keep it maximal’ with random effects. if we followed this recommendation we should also include a random effect for all within-subject fixed effects in our model (i.e., Condition). to do this we would simply include (1+Condition|Sub.Num). this is another gray area in the research literature and there are some statisticians (e.g., Jake Oleson at Iowa) who argue that the full random effects structure often does not improve model fit and in many models is not even possible (as we’ll see later)
we’ll fit the model with both random effect structures to see how the results change.

finally, data=byTrial tells lmer() which dataframe to use

m = lmer(Accuracy-.5 ~ Condition + (1|Sub.Num),data=byTrial)

If we look at the summary of our model results notice how there are t, but not p-values. This was an intentional choice by the lme4 team, because they want you to be aware of the fact that you need to choose an approximation method to estimate degrees of freedom and therefore significance.

summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Accuracy - 0.5 ~ Condition + (1 | Sub.Num)
##    Data: byTrial
## 
## REML criterion at convergence: -254.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.2862 -0.4058  0.4168  0.6330  0.8744 
## 
## Random effects:
##  Groups   Name        Variance                  Std.Dev.       
##  Sub.Num  (Intercept) 0.00000000000000000006936 0.0000000002634
##  Residual             0.04132569348100680051150 0.2032872191777
## Number of obs: 766, groups:  Sub.Num, 56
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept)    0.37133    0.01040  35.701
## Condition:pre -0.04908    0.01469  -3.341
## 
## Correlation of Fixed Effects:
##             (Intr)
## Conditin:pr -0.708
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular

Notice that our random intercept is accounting for 0 variance. This is the reason that the model fit generated the warning: boundary (singular) fit: see ?isSingular. The model was able to converge on estimates for our fixed effects, but the random effect structure isn’t doing much for us (one reason to consider just conducting a linear regression later).

To approximate degrees of freedom and significance, I prefer to use the kenward-roger approximation:

Anova(m,type=3,test='F')
## Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)
## 
## Response: Accuracy - 0.5
##                    F Df Df.res                Pr(>F)    
## (Intercept) 1272.932  1 195.20 < 0.00000000000000022 ***
## Condition     11.156  1 720.99             0.0008808 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And here are our model results if we include the full random effects structure:

m = lmer(Accuracy-.5 ~ Condition + (1+Condition|Sub.Num),data=byTrial)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Accuracy - 0.5 ~ Condition + (1 + Condition | Sub.Num)
##    Data: byTrial
## 
## REML criterion at convergence: -255.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3457 -0.3966  0.4021  0.6638  1.0619 
## 
## Random effects:
##  Groups   Name          Variance  Std.Dev. Corr 
##  Sub.Num  (Intercept)   0.0002186 0.01479       
##           Condition:pre 0.0025768 0.05076  -1.00
##  Residual               0.0405792 0.20144       
## Number of obs: 766, groups:  Sub.Num, 56
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept)    0.37105    0.01050  35.333
## Condition:pre -0.04833    0.01610  -3.001
## 
## Correlation of Fixed Effects:
##             (Intr)
## Conditin:pr -0.710
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular
## Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)
## 
## Response: Accuracy - 0.5
##                     F Df Df.res                Pr(>F)    
## (Intercept) 1244.0130  1 52.636 < 0.00000000000000022 ***
## Condition      8.9797  1 53.462              0.004133 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Notice that neither random effect (intercept or condition slope) is accounting for much variance (<.003) and we again get the warning isSingular warning.

The estimated fixed effect changes slightly as does the p-value, but the overall pattern is the same. Notice, however, that we have a more appropriate estimate for the degrees of freedom for the fixed effect of Condition (56 participants - 1 intercept - 1 condition effect = 54).

I would write this up as something like:

There was a significant effect of condition on children’s accuracy, b = -0.05, F(1,53.5)=8.98, p < .01. Children’s accuracy in familiar word recognition was higher before the dimensional switch (M = 0.87, SD = 0.08) than after the dimensional switch (M = 0.83, SD = 0.09).

Recall that the intercept is for the pre-switch condition. To compare accuracy in the post-switch condition to chance we just re-center the contrasts for our factor and refit the model:

contrasts(byTrial$Condition) = c(1,0)
colnames(attr(byTrial$Condition,"contrasts")) = ':post'

m = lmer(Accuracy-.5 ~ Condition + (1+Condition|Sub.Num),data=byTrial)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Accuracy - 0.5 ~ Condition + (1 + Condition | Sub.Num)
##    Data: byTrial
## 
## REML criterion at convergence: -255.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3457 -0.3966  0.4021  0.6637  1.0620 
## 
## Random effects:
##  Groups   Name           Variance Std.Dev. Corr 
##  Sub.Num  (Intercept)    0.001295 0.03598       
##           Condition:post 0.002578 0.05077  -1.00
##  Residual                0.040579 0.20144       
## Number of obs: 766, groups:  Sub.Num, 56
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)     0.32272    0.01139  28.345
## Condition:post  0.04833    0.01611   3.001
## 
## Correlation of Fixed Effects:
##             (Intr)
## Conditn:pst -0.760
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular
## Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)
## 
## Response: Accuracy - 0.5
##                    F Df Df.res                Pr(>F)    
## (Intercept) 801.2513  1 53.223 < 0.00000000000000022 ***
## Condition     8.9792  1 53.462              0.004134 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lmerTest

Another approximation method is the Satterthwaite approximation, which we can get by using the lmerTest package to fit our model:

m = lmerTest::lmer(Accuracy-.5 ~ Condition + (1+Condition|Sub.Num),data=byTrial)
summary(m)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Accuracy - 0.5 ~ Condition + (1 + Condition | Sub.Num)
##    Data: byTrial
## 
## REML criterion at convergence: -255.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3457 -0.3966  0.4021  0.6637  1.0620 
## 
## Random effects:
##  Groups   Name           Variance Std.Dev. Corr 
##  Sub.Num  (Intercept)    0.001295 0.03598       
##           Condition:post 0.002578 0.05077  -1.00
##  Residual                0.040579 0.20144       
## Number of obs: 766, groups:  Sub.Num, 56
## 
## Fixed effects:
##                Estimate Std. Error       df t value             Pr(>|t|)    
## (Intercept)     0.32272    0.01139 54.61652  28.345 < 0.0000000000000002 ***
## Condition:post  0.04833    0.01611 58.09008   3.001              0.00396 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## Conditn:pst -0.760
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular

This package automatically includes the test of significance (using t- rather than f-values). But again, the pattern of results is very similar.

nlme

Finally, here’s what it would look like if we instead fit the model using the nlme package:

m = nlme::gls(Accuracy-.5 ~ Condition, corr=nlme::corCompSymm(form=~Condition|Sub.Num),
              na.action=na.omit,data=byTrial,method='REML')

summary(m)
## Generalized least squares fit by REML
##   Model: Accuracy - 0.5 ~ Condition 
##   Data: byTrial 
##         AIC       BIC   logLik
##   -246.3016 -227.7473 127.1508
## 
## Correlation Structure: Compound symmetry
##  Formula: ~Condition | Sub.Num 
##  Parameter estimate(s):
##          Rho 
## -0.002257671 
## 
## Coefficients:
##                    Value  Std.Error   t-value p-value
## (Intercept)    0.3221654 0.01030141 31.273917  0.0000
## Condition:post 0.0490928 0.01470472  3.338574  0.0009
## 
##  Correlation: 
##                (Intr)
## Condition:post -0.712
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -4.2859237 -0.4054255  0.4172068  0.6333110  0.8748100 
## 
## Residual standard error: 0.2032837 
## Degrees of freedom: 766 total; 764 residual

Caution: I am the least familiar with using gls() for mixed effects modeling. There are strong pro’s to using this approach, because you can use different matrices for the correlations between the random effects:

  • corCompSymm assumes there is compound symmetry
  • corComp allows for an unstructured covariance matrix where the correlations are not assumed to be equal
  • AR1 has the correlations decay over time.

I do not know exactly what these differences entail or how to determine which is most appropriate for your data set. But I mention them in case folks are interested in following up on this on their own to learn more (it is on my to-learn list in the near future). gls() also works with simple linear regression (you remove the correlation matrix), which is nice.

A brief aside, both lmer() and lm() models assume that the variance is the same between each level of your fixed effect. This might not be the case, however, especially if you are comparing children who are typically developing (who usually have less variability in their performance) to children with atypical development (where there is usually a wider range in performance across children within the group). With gls() you can modify the model so that it does not assume equal variance between levels of your fixed effect by adding weights=nlme::varIdent(form=~1|Dx) or in our model if there was more variability in post-switch compared to pre-switch performance (which is not the case based on our earlier plots) we could add weights=nlme::varIdent(form=~1|Condition) then you simply compare this new model which assumes heterogeneous variance to the original gls() model that assumed homogeneous variance and can determine whether one is a better fit by determining whether the AIC value for one is lower (i.e., more negative) than the other (usually by at least 2 points).

Subject Level

To simplify things, we will just focus on using the lme4 package and the kenward-rogers approximation for linear mixed effects modeling moving forward.

We again need to update our Condition column to be a factor, since we are using a new dataframe:

# bySub$Condition = factor(bySub$Condition,c('pre-switch','post-switch'))
contrasts(bySub$Condition) = c(0,1)
colnames(attr(bySub$Condition,"contrasts")) = ':pre'

Let’s fit the model with just the random intercept:

m = lmer(Accuracy-.5 ~ Condition + (1|Sub.Num),data=bySub)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Accuracy - 0.5 ~ Condition + (1 | Sub.Num)
##    Data: bySub
## 
## REML criterion at convergence: -222.2
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.76374 -0.62793  0.06122  0.76353  2.03449 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Sub.Num  (Intercept) 0.000000 0.00000 
##  Residual             0.007218 0.08496 
## Number of obs: 112, groups:  Sub.Num, 56
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept)    0.37249    0.01135  32.809
## Condition:pre -0.04534    0.01606  -2.824
## 
## Correlation of Fixed Effects:
##             (Intr)
## Conditin:pr -0.707
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular

Notice that our random intercept is again accounting for 0 variance and we again get the warning: boundary (singular) fit: see ?isSingular.

Anova(m,type=3,test='F')
## Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)
## 
## Response: Accuracy - 0.5
##                     F Df Df.res                Pr(>F)    
## (Intercept) 1076.4034  1    110 < 0.00000000000000022 ***
## Condition      7.9747  1     55              0.006594 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The pattern of results is broadly similar to what we observed before:

There was a significant effect of condition on children’s accuracy, b = -0.05, F(1,55)=7.97, p < .01.

When analyzing the data at the subject level, however, we cannot include the full random effects structure, because we only have 2 data points per participant (1 accuracy averaging across the pre-switch trials and 1 accuracy averaging across the post-switch trials). Including the random slope for condition would overfit the model, including just as many random effects as we have observations. R will generate an error message if you try to fit this model (which prevents me from knitting the final markdown output)

# m = lmer(Accuracy-.5 ~ Condition + (1+Condition|Sub.Num),data=bySub)

Linear model

In order to use simple linear regression, we can only have 1 data point per participant. We can finagle this by creating a new dependent variable by subtracting children’s average accuracy in the post-switch condition from their average accuracy in the pre-switch condition. This difference score then becomes our DV and the intercept in the model will tell us whether this difference score is significantly greater than 0 (i.e., whether there’s a significant effect of condition).

In order to do this, we need to use a new tidyr feature:

pivot_wider() will convert the data frame from long format (multiple rows per subject, one for each condition) into wide format (one row per subject, multiple columns for each condition)
we need to first remove the Trials column using select() because the differing number of trials for each condition will prevent a successful pivot
we then specify that we will use the values from the Condition column in our original data frame to label the new columns in our new data frame and that the values for thse new colums will come from the Accuracy column in our original data frame

bySubWide = bySub %>% 
  select(-Trials) %>% 
  pivot_wider(names_from=Condition,values_from=Accuracy)

Here’s what our new data frame looks like

Sub.Num post-switch pre-switch
501 0.73 0.95
502 0.80 0.87
503 0.64 0.88
504 0.71 0.87
505 0.95 0.94
506 0.89 0.87

Finally, we just use mutate to calculate the difference score between these columns

bySubWide = bySubWide %>% mutate(diff = `post-switch` - `pre-switch`)

And now we can fit our linear regression model:

m = lm(diff ~ 1, data =bySubWide)
summary(m)
## 
## Call:
## lm(formula = diff ~ 1, data = bySubWide)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27341 -0.06768 -0.00319  0.06931  0.35489 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -0.04534    0.01615  -2.807   0.0069 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1209 on 55 degrees of freedom

I use the lm() syntax because we can make the model more complicated by adding in other fixed effects (e.g., DCCS) which would essentially test whether there is a significant interaction between our effect of condition and that fixed effect.

If we aren’t including any other fixed effects, however, our lm() is essentially just a t-test:

t.test(bySubWide$`pre-switch`, bySubWide$`post-switch`, paired = TRUE)
## 
##  Paired t-test
## 
## data:  bySubWide$`pre-switch` and bySubWide$`post-switch`
## t = 2.8071, df = 55, p-value = 0.006903
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.01297110 0.07771226
## sample estimates:
## mean of the differences 
##              0.04534168
t.test(bySubWide$diff)
## 
##  One Sample t-test
## 
## data:  bySubWide$diff
## t = -2.8071, df = 55, p-value = 0.006903
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.07771226 -0.01297110
## sample estimates:
##   mean of x 
## -0.04534168