This script uses the following packages:
# library(ggplot2)
# library(tidyr)
# library(lme4)
# library(lmerTest)
# library(nlme)
# library(knitr)
# library(kableExtra)
## 'data.frame': 206254 obs. of 11 variables:
## $ Sub.Num : int 501 501 501 501 501 501 501 501 501 501 ...
## $ Tr.Num : int 2 2 2 2 2 2 2 2 2 2 ...
## $ Order : chr "CN-1A" "CN-1A" "CN-1A" "CN-1A" ...
## $ Condition : chr "pre-switch" "pre-switch" "pre-switch" "pre-switch" ...
## $ Target : chr "slide" "slide" "slide" "slide" ...
## $ Target.Side : chr "l" "l" "l" "l" ...
## $ TimeC : int -3067 -3033 -3000 -2967 -2933 -2900 -2867 -2833 -2800 -2767 ...
## $ Accuracy : num NA NA NA NA NA NA NA NA NA NA ...
## $ Block : chr "1A" "1A" "1A" "1A" ...
## $ percentMissingFrames: num 10 10 10 10 10 10 10 10 10 10 ...
## $ Source : chr "tobii" "tobii" "tobii" "tobii" ...
## Sub.Num Tr.Num Order Condition Target Target.Side TimeC Accuracy Block
## 1 501 2 CN-1A pre-switch slide l -3067 NA 1A
## 2 501 2 CN-1A pre-switch slide l -3033 NA 1A
## 3 501 2 CN-1A pre-switch slide l -3000 NA 1A
## 4 501 2 CN-1A pre-switch slide l -2967 NA 1A
## 5 501 2 CN-1A pre-switch slide l -2933 NA 1A
## 6 501 2 CN-1A pre-switch slide l -2900 NA 1A
## percentMissingFrames Source
## 1 10 tobii
## 2 10 tobii
## 3 10 tobii
## 4 10 tobii
## 5 10 tobii
## 6 10 tobii
Recall that in this experiment, 5-year-old children completed a slightly altered version of the looking-while-listening (LWL) task. On each trial, children were shown pictures of two familiar objects and heard a sentence identifying one using either its color or name. Trials were blocked so that there were 8 trials using one dimension (pre-switch), 8 using the other dimension (post-switch).
In these analyses, children’s familiar word recognition will be quantified as the proportion of time spent fixating the target image out of the total time fixating either the target or distractor image after the target image was identified. Consistent with prior research using the looking-while-listening (LWL) paradigm, we will calculate this proportion (i.e., average accuracy) during a critical window 300 to 1,800 ms after the onset of the target word.
The window starts at 300, because it takes approximately this much time for children to program an eye movement (therefore any fixations between 0 and 300ms are unlikely to be stimulus-driven and were initiated before children began processing the target word). The window ends at 1,800, because children’s attention wanes over time and 1.5 seconds is a long time to stare at an object after it’s name (therefore changes in fixations after 1,800ms are also unlikely to be stimulus-drive as children’s attention wanders and they fixate elsewhere).
The Accuracy
column in our data frame has a value of 1
for frames when children were fixating the target image, a value of 0
for frames when they were fixating the distractor image, and a value of
NA for frames when they were fixating neither image.
This data set is a combination of tobii (downsampled from 60 to 30Hz) and handcoded data (at 30Hz). Therefore, for each trial we have a measure of children’s accuracy in fixating the target image every 33ms during our critical window: 300, 333, 367, 400, 433, 467, 500, 533, 567, 600, 633, 667, 700, 733, 767, 800, 833, 867, 900, 933, 967, 1000, 1033, 1067, 1100, 1133, 1167, 1200, 1233, 1267, 1300, 1333, 1367, 1400, 1433, 1467, 1500, 1533, 1567, 1600, 1633, 1667, 1700, 1733, 1767, 1800.
By calculating the average using mean()
of these 46
frames for each trial, we will identify the proportion of frames where
children are fixating the target (1’s) out of the frames fixating either
the target or distractor (1’s & 0’s). Note that any frames where
children are not fixating either image (NAs) will be dropped when
calculating the average na.rm=TRUE
. This is the standard in
LWL/Visual World Paradigm (VWP) research, rather than scoring fixations
outside either AOI as an accuracy of 0.
We have data for 57 subjects.
For our plots and analyses we’ll need to aggregate accuracy over 3 levels:
Let’s take this step by step and look at the output.
Here’s the first step:
tidyr syntax:
filter()
to only include rows for trials with sufficient data and rows for frames in our critical window
group_by()
to keep the data at the subject- and trial-level and collapse across time (we include condition here, because otherwise this column would be dropped. alternatively, we could include condition in the summarise() step)
summarise()
to calculate the average accuracy when collapsing across time (I have also included Frames for illustrative purposes, but it is not necessary)
byTrial = d %>%
filter(percentMissingFrames < 50,
TimeC >= 300 & TimeC <=1800) %>%
group_by(Sub.Num,Condition,Tr.Num) %>%
summarise(
Frames = sum(!is.na(Accuracy)),
Accuracy = mean(Accuracy,na.rm=T))
Here is all of the data for subject 501:
Sub.Num | Condition | Tr.Num | Frames | Accuracy |
---|---|---|---|---|
501 | pre-switch | 2 | 42 | 0.98 |
501 | pre-switch | 3 | 46 | 1.00 |
501 | pre-switch | 4 | 46 | 1.00 |
501 | pre-switch | 6 | 46 | 1.00 |
501 | pre-switch | 7 | 45 | 0.82 |
501 | pre-switch | 8 | 42 | 0.88 |
501 | pre-switch | 9 | 26 | 1.00 |
501 | post-switch | 10 | 46 | 1.00 |
501 | post-switch | 11 | 37 | 0.59 |
501 | post-switch | 12 | 44 | 0.80 |
501 | post-switch | 13 | 45 | 0.82 |
501 | post-switch | 14 | 46 | 1.00 |
501 | post-switch | 15 | 30 | 0.20 |
501 | post-switch | 16 | 41 | 0.49 |
501 | post-switch | 17 | 46 | 0.93 |
Notice that we have 15 data points for subject 501 - 1 per trial. There should be 16 data points (8 trials in each of the 2 condition), but we are missing data for trial 5 in the pre-switch condition, because we removed trials where the child was fixating either image for less than 23 frames (50%) of the critical window.
Our data frame has columns indicating the number of frames the child spent fixating either the target or distractor image during the critical window for each trial (maximum of 46) and what proportion of those fixations (i.e., their average accuracy) were fixations to the target image. For subject 501, accuracies range from 0.2 to 1.
The next step is to collapse across trials within each condition for each subject:
bySub = byTrial %>%
group_by(Sub.Num,Condition) %>%
summarise(
Trials = sum(!is.na(Accuracy)),
Accuracy = mean(Accuracy,na.rm=T))
Here is all of the data for subject 501:
Sub.Num | Condition | Trials | Accuracy |
---|---|---|---|
501 | pre-switch | 7 | 0.95 |
501 | post-switch | 8 | 0.73 |
Notice that we now have only 2 data points for subject 501 (1 per condition). Looking at the Trials column, we see again that 1 trial was excluded for the pre-switch condition, but all trials were included in the post-switch condition. The accuracy column is now averaging across the previous averages (we calculating across frames within the critical window). So 501 had accuracies of 0.98, 1, 1, 1, 0.82, 0.88, 1 on the 7 trials in the pre-switch condition and the average of these 7 average accuracies is 0.95
The final step is to collapse across subjects within each condition:
For plotting purposes, we want to calculate several metrics about the average accuracy for each condition:
Subjects
is the number of subjects who had an average accuracySD
the SD of the average accuracies across subjectsSE
is calculated by using our previous two outputs: divding SD by the square-root of SubjectsAccuracy
is the average accuracy across subjects, note: its important that this variable match the variable name in thebySub
data frame, because we will be using both for our plots (more on this in a second)lower
will be used to plot the SE bars and uses two of our previous outputs: subtracting 1 SE from the mean Accuracyupper
ditto to lower, but adding 1 SE
byGroup = bySub %>%
group_by(Condition) %>%
summarise(
Subjects = sum(!is.na(Accuracy)),
SD = sd(Accuracy,na.rm=TRUE),
SE = SD/sqrt(Subjects),
Accuracy = mean(Accuracy,na.rm=TRUE),
lower=Accuracy-SE,
upper=Accuracy+SE)
Here is all of the data:
Condition | Subjects | SD | SE | Accuracy | lower | upper |
---|---|---|---|---|---|---|
pre-switch | 56 | 0.08 | 0.01 | 0.87 | 0.86 | 0.88 |
post-switch | 56 | 0.09 | 0.01 | 0.83 | 0.82 | 0.84 |
Notice that we only have data for 56 participants in each condition, but in our full data set we had 57 participants. Subject 518 was dropped in the process. This is because they were missing too many frames during the critical window for every trial:
Condition | Tr.Num | percentMissingFrames |
---|---|---|
pre-switch | 2 | 100.00 |
pre-switch | 3 | 100.00 |
pre-switch | 4 | 100.00 |
pre-switch | 5 | 100.00 |
pre-switch | 6 | 100.00 |
pre-switch | 7 | 100.00 |
pre-switch | 8 | 100.00 |
pre-switch | 9 | 100.00 |
post-switch | 10 | 100.00 |
post-switch | 11 | 100.00 |
post-switch | 12 | 100.00 |
post-switch | 13 | 100.00 |
post-switch | 14 | 100.00 |
post-switch | 15 | 100.00 |
post-switch | 16 | 100.00 |
post-switch | 17 | 98.89 |
This is because the tobii did not track this child’s gaze well, but we were unable to handcode because the quality of the video recording during their session was too poor.
We’re going to be plotting children’s accuracy separately for the
pre-switch and post-switch conditions. Condition
in our
dataframes right now is a character, so ggplot will default to order our
x-axis alphabetically. But the alphabetical order (post-switch then
pre-switch) is opposite for how they actually occured during the
experiment. We can control the order in which our conditions occur by
converting them into a factor with the correct ordering:
byTrial$Condition = factor(byTrial$Condition,c('pre-switch','post-switch'))
bySub$Condition = factor(bySub$Condition,c('pre-switch','post-switch'))
byGroup$Condition = factor(byGroup$Condition,c('pre-switch','post-switch'))
Traditionally, we would plot children’s average accuracy using a bar plot.
Let’s use this plot as an opportunity to illustrate how ggplot is designed to be flexible in that you can provide it with minimal code and it will determine the appropriate graphical settings (e.g., width of the x- and y-axis).
Let’s start with the minimum:
ggplot syntax:
- we are giving the dataframe to use
byGroup
, and which columns to use for the x axisCondition
, yAccuracy
and colorCondition
geom_bar()
to get a bar plotgeom_errorbar()
to add error bars and specify the columns used so that it corresponds to +/- 1 SE
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
geom_bar(stat='identity')+
geom_errorbar(aes(ymin=lower, ymax=upper))
Yuck! let’s make this better…
3a. make the error bars appropriately sized
width=.5
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
geom_bar(stat='identity')+
geom_errorbar(width=.5, aes(ymin=lower, ymax=upper))
- add a horizontal line
geom_hline()
to indicate chance performance (i.e., 50% which is equal time fixating target and distractor) and fix it’s aesthetics
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
geom_bar(stat='identity')+
geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) +
geom_hline(yintercept=.5,linetype='dashed',color='black')
coord_cartesian()
to change the axes so that our bar plot isn’t floating
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
geom_bar(stat='identity')+
geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) +
geom_hline(yintercept=.5,linetype='dashed',color='black') +
coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F)
scale_y_continuous()
to change the breaks in the y-axis to occur more regularly (i.e., every 10%)
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
geom_bar(stat='identity')+
geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) +
geom_hline(yintercept=.5,linetype='dashed',color='black') +
coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F) +
scale_y_continuous(breaks=seq(from=0,to=1,by=.1))
labs()
to make our axis labels more transparent
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
geom_bar(stat='identity')+
geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) +
geom_hline(yintercept=.5,linetype='dashed',color='black') +
coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F) +
scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
labs(x='Condition',y='Accuracy (300 to 1800ms)')
scale_fill_manual()
to choose better colors and use a conceptually coherent mapping (red = worse performance)
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
geom_bar(stat='identity')+
geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) +
geom_hline(yintercept=.5,linetype='dashed',color='black') +
coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F) +
scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
labs(x='Condition',y='Accuracy (300 to 1800ms)') +
scale_fill_manual(values=c('dodgerblue','coral2'))
theme_bw()
to remove the gray background and increase the minimum font size
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
geom_bar(stat='identity')+
geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) +
geom_hline(yintercept=.5,linetype='dashed',color='black') +
coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F) +
scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
labs(x='Condition',y='Accuracy (300 to 1800ms)') +
scale_fill_manual(values=c('dodgerblue','coral2')) +
theme_bw(base_size=14)
theme()
to remove legend (which is redundant with our x-axis labels)
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
geom_bar(stat='identity')+
geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) +
geom_hline(yintercept=.5,linetype='dashed',color='black') +
coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F) +
scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
labs(x='Condition',y='Accuracy (300 to 1800ms)') +
scale_fill_manual(values=c('dodgerblue','coral2')) +
theme_bw(base_size=14) +
theme(legend.position='none')
geom_point()
to add in individual subjects data
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
geom_bar(stat='identity')+
geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) +
geom_point(data=bySub)+
geom_hline(yintercept=.5,linetype='dashed',color='black') +
coord_cartesian(ylim=c(0,1),xlim=c(.5,2.5),expand=F) +
scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
labs(x='Condition',y='Accuracy (300 to 1800ms)') +
scale_fill_manual(values=c('dodgerblue','coral2')) +
theme_bw(base_size=14) +
theme(legend.position='none')
- fix
geom_point()
to add some horizontal jitter so we can more clearly see overlapping data points and increase the y-axis so points are not clipped
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
geom_bar(stat='identity')+
geom_errorbar(width=.5, aes(ymin=lower, ymax=upper)) +
geom_point(data=bySub, position=position_jitter(width = .05))+
geom_hline(yintercept=.5,linetype='dashed',color='black') +
coord_cartesian(ylim=c(0,1.05),xlim=c(.5,2.5),expand=F) +
scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
labs(x='Condition',y='Accuracy (300 to 1800ms)') +
scale_fill_manual(values=c('dodgerblue','coral2')) +
theme_bw(base_size=14) +
theme(legend.position='none')
I have moved away from bar plots with jittered data points, because I don’t think they do a good job conveying the distribution across participants. I also think there’s some research showing biases that lead folks to misinterpret barplots. So, I’ve shifted to using violin plots instead.
This involves a few changes:
- use
geom_violin()
instead ofgeom_point()
for the bySub data, move this to be plotted first, and make it slightly transparentalpha=.8
- use
geom_point()
instead ofgeom_bar()
for the byGroup data and modify the asthetics- reduce the width of the error bars
ggplot(byGroup,aes(x=Condition,y=Accuracy,fill=Condition)) +
geom_violin(data=bySub,alpha=.8)+
geom_point(shape=16,fill='black',colour='black',size=2)+
geom_errorbar(width=.2, aes(ymin=lower, ymax=upper)) +
geom_hline(yintercept=.5,linetype='dashed',color='black') +
coord_cartesian(ylim=c(0,1.05),xlim=c(.5,2.5),expand=F) +
scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
labs(x='Condition',y='Accuracy (300 to 1800ms)') +
scale_fill_manual(values=c('dodgerblue','coral2')) +
theme_bw(base_size=14) +
theme(legend.position='none')
There are several different models we can use when analyzing the data:
As far as I can tell, there is no established preference in the literature between the first two options. But here are some things to consider:
1. (pro mixed effect at subject level) linear models assume that the data are normally distributed, which means that the DVs you are trying to predict (either accuracies at the individual trial level or accuracies at the subject level) should be between 0.3 and 0.7 with only a few cases near the 0 and 1 boundaries. these extreme values in accuracy are more common at the individual trial level - it’s more likely to have an individual trial where the child looked at the target image for the entire 1.5 second window than it is to have an individual child who looked at the target image for the entire 1.5 second window on all 8 trials in the condition.
2. (pro mixed effect at the trial level) the linear mixed effects model at the subject level is discarding information, specifically inter-trial variability. for example, subject 1 and 2 may both have 50% accuracy averaged across the 8 trials in the post-switch condition, but subject 1 was consistent (50% accuracy on all 8 trials) while subject 2 was inconsistent (having 100% accuracy on 4 trials and 0% accuracy on the other 4 trials). removing this variance in the model at the subject level reduces the standard error, which will likely result in smaller p-values. moreover, the linear mixed effects model at the subject level is essentially operating under the assumption that each trial is a repeated sample from the same distribution and we can therefore average across them. if, however, we expect some trials to be more difficult than others (e.g., familiar words with later age of acquisition) then we really should keep the data at the individual trial level and consider including a random effect for items (more on this later)
3. (pro linear model) there is no transparent or agreed upon way to determine degrees of freedom and effect sizes for linear mixed effects models. therefore, we’ll be using approximations when conducting tests of significance (more on this later). generally, the different approximations will yield similar patterns of results, but not always (especially if you have marginal effects). as you’ll see, the linear models are clumsier to fit (we’ll be fitting a difference score rather than our actual accuracies) but in some cases - especially if you want to determine effect sizes for power analyses - they may be better.
Let’s walk through each of the different models.
Before we fit our model let’s convert the independent variable (Condition) into a factor.
Factor syntax:
- we re-assign the values of Condition to itself after converting them to be a
factor()
. the order in which we specify each level in the factorc('pre-switch','post-switch')
is the order in which they will appear in tables and plots (if we had left them as strings, R would otheriwse display them in alphabetical order). recall that we previously did this when making our plots, so we don’t actually need to run this first command again.- we set the contrasts for each level so that cells where Condition is ‘pre-switch’ receive a value of 0 and cells where Condition is ‘post-switch’ receive a value of 1. this has several effects in our model results:
- first, the beta coefficient for Condition tells us how much accuracy increases for a 1-point increase in Condition. with our contrasts specified in the order that they are, this is how much accuracy changes when moving from the pre-switch to the post-switch condition.
- second, the beta coefficient affects lower order effects in our model. the beta coefficient for our intercept is the average accuracy where Condition = 0. with our contrasts specified as they are, the intercept will tell us the average accuracy for the pre-switch condition.
- we assign an attribute label to the contrasts as
:pre
because we centered our factor on the pre-switch condition
# byTrial$Condition = factor(byTrial$Condition,c('pre-switch','post-switch'))
contrasts(byTrial$Condition) = c(0,1)
colnames(attr(byTrial$Condition,"contrasts")) = ':pre'
Now we fit a linear mixed effects model using the lmer()
function from the lme4
package
lmer() syntax:
we are assigning the output of the model to a new variable
m
Accuracy ~
specifies the dependent variable that we are predicting
we want to subtract-.5
from accuracy so that the intercept will compare children’s performance against chance. Without this adjustment, the intercept would tell us whether children’s accuracy is significantly greater than 0. But we really care about whether children’s accuracy is significantly greater than 0.5
Condition
is the fixed effect, the model will also include an intercept by default, if for some reason you do not want an intercept you would add0 + Condition
instead.
for the purposes of this tutorial we are not including other fixed effects (like children’s performance on the DCCS a measure of Executive Function), but if we did we could add that using+ DCCS
if we wanted to include the interaction between between Condition and DCCS we could do that by additionally adding+ Condition:DCCS
or more simply we could specify the entire fixed effect structure usingCondition*DCCS
the*
tells R to include all fixed effects and possible interactions
(1|Sub.Num)
adds a random intercept for each participant
this is necessary to account for the non-independence when we have multiple data points per participant (e.g., a child who is very accurate on trial 1 will likely be very accurate on trial 2, because they have strong language skills)
there’s some advice to ‘keep it maximal’ with random effects. if we followed this recommendation we should also include a random effect for all within-subject fixed effects in our model (i.e., Condition). to do this we would simply include(1+Condition|Sub.Num)
. this is another gray area in the research literature and there are some statisticians (e.g., Jake Oleson at Iowa) who argue that the full random effects structure often does not improve model fit and in many models is not even possible (as we’ll see later)
we’ll fit the model with both random effect structures to see how the results change.finally,
data=byTrial
tells lmer() which dataframe to use
m = lmer(Accuracy-.5 ~ Condition + (1|Sub.Num),data=byTrial)
If we look at the summary of our model results notice how there are t, but not p-values. This was an intentional choice by the lme4 team, because they want you to be aware of the fact that you need to choose an approximation method to estimate degrees of freedom and therefore significance.
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Accuracy - 0.5 ~ Condition + (1 | Sub.Num)
## Data: byTrial
##
## REML criterion at convergence: -254.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2862 -0.4058 0.4168 0.6330 0.8744
##
## Random effects:
## Groups Name Variance Std.Dev.
## Sub.Num (Intercept) 0.00000000000000000006936 0.0000000002634
## Residual 0.04132569348100680051150 0.2032872191777
## Number of obs: 766, groups: Sub.Num, 56
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.37133 0.01040 35.701
## Condition:pre -0.04908 0.01469 -3.341
##
## Correlation of Fixed Effects:
## (Intr)
## Conditin:pr -0.708
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular
Notice that our random intercept is accounting for 0 variance. This
is the reason that the model fit generated the warning:
boundary (singular) fit: see ?isSingular
. The model was
able to converge on estimates for our fixed effects, but the random
effect structure isn’t doing much for us (one reason to consider just
conducting a linear regression later).
To approximate degrees of freedom and significance, I prefer to use the kenward-roger approximation:
Anova(m,type=3,test='F')
## Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)
##
## Response: Accuracy - 0.5
## F Df Df.res Pr(>F)
## (Intercept) 1272.932 1 195.20 < 0.00000000000000022 ***
## Condition 11.156 1 720.99 0.0008808 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
And here are our model results if we include the full random effects structure:
m = lmer(Accuracy-.5 ~ Condition + (1+Condition|Sub.Num),data=byTrial)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Accuracy - 0.5 ~ Condition + (1 + Condition | Sub.Num)
## Data: byTrial
##
## REML criterion at convergence: -255.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.3457 -0.3966 0.4021 0.6638 1.0619
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Sub.Num (Intercept) 0.0002186 0.01479
## Condition:pre 0.0025768 0.05076 -1.00
## Residual 0.0405792 0.20144
## Number of obs: 766, groups: Sub.Num, 56
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.37105 0.01050 35.333
## Condition:pre -0.04833 0.01610 -3.001
##
## Correlation of Fixed Effects:
## (Intr)
## Conditin:pr -0.710
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular
## Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)
##
## Response: Accuracy - 0.5
## F Df Df.res Pr(>F)
## (Intercept) 1244.0130 1 52.636 < 0.00000000000000022 ***
## Condition 8.9797 1 53.462 0.004133 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that neither random effect (intercept or condition slope) is accounting for much variance (<.003) and we again get the warning isSingular warning.
The estimated fixed effect changes slightly as does the p-value, but the overall pattern is the same. Notice, however, that we have a more appropriate estimate for the degrees of freedom for the fixed effect of Condition (56 participants - 1 intercept - 1 condition effect = 54).
I would write this up as something like:
There was a significant effect of condition on children’s accuracy, b = -0.05, F(1,53.5)=8.98, p < .01. Children’s accuracy in familiar word recognition was higher before the dimensional switch (M = 0.87, SD = 0.08) than after the dimensional switch (M = 0.83, SD = 0.09).
Recall that the intercept is for the pre-switch condition. To compare accuracy in the post-switch condition to chance we just re-center the contrasts for our factor and refit the model:
contrasts(byTrial$Condition) = c(1,0)
colnames(attr(byTrial$Condition,"contrasts")) = ':post'
m = lmer(Accuracy-.5 ~ Condition + (1+Condition|Sub.Num),data=byTrial)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Accuracy - 0.5 ~ Condition + (1 + Condition | Sub.Num)
## Data: byTrial
##
## REML criterion at convergence: -255.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.3457 -0.3966 0.4021 0.6637 1.0620
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Sub.Num (Intercept) 0.001295 0.03598
## Condition:post 0.002578 0.05077 -1.00
## Residual 0.040579 0.20144
## Number of obs: 766, groups: Sub.Num, 56
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.32272 0.01139 28.345
## Condition:post 0.04833 0.01611 3.001
##
## Correlation of Fixed Effects:
## (Intr)
## Conditn:pst -0.760
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular
## Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)
##
## Response: Accuracy - 0.5
## F Df Df.res Pr(>F)
## (Intercept) 801.2513 1 53.223 < 0.00000000000000022 ***
## Condition 8.9792 1 53.462 0.004134 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Another approximation method is the Satterthwaite approximation,
which we can get by using the lmerTest
package to fit our
model:
m = lmerTest::lmer(Accuracy-.5 ~ Condition + (1+Condition|Sub.Num),data=byTrial)
summary(m)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Accuracy - 0.5 ~ Condition + (1 + Condition | Sub.Num)
## Data: byTrial
##
## REML criterion at convergence: -255.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.3457 -0.3966 0.4021 0.6637 1.0620
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Sub.Num (Intercept) 0.001295 0.03598
## Condition:post 0.002578 0.05077 -1.00
## Residual 0.040579 0.20144
## Number of obs: 766, groups: Sub.Num, 56
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.32272 0.01139 54.61652 28.345 < 0.0000000000000002 ***
## Condition:post 0.04833 0.01611 58.09008 3.001 0.00396 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Conditn:pst -0.760
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular
This package automatically includes the test of significance (using t- rather than f-values). But again, the pattern of results is very similar.
Finally, here’s what it would look like if we instead fit the model
using the nlme
package:
m = nlme::gls(Accuracy-.5 ~ Condition, corr=nlme::corCompSymm(form=~Condition|Sub.Num),
na.action=na.omit,data=byTrial,method='REML')
summary(m)
## Generalized least squares fit by REML
## Model: Accuracy - 0.5 ~ Condition
## Data: byTrial
## AIC BIC logLik
## -246.3016 -227.7473 127.1508
##
## Correlation Structure: Compound symmetry
## Formula: ~Condition | Sub.Num
## Parameter estimate(s):
## Rho
## -0.002257671
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 0.3221654 0.01030141 31.273917 0.0000
## Condition:post 0.0490928 0.01470472 3.338574 0.0009
##
## Correlation:
## (Intr)
## Condition:post -0.712
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -4.2859237 -0.4054255 0.4172068 0.6333110 0.8748100
##
## Residual standard error: 0.2032837
## Degrees of freedom: 766 total; 764 residual
Caution: I am the least familiar with using
gls()
for mixed effects modeling. There are strong pro’s to
using this approach, because you can use different matrices for the
correlations between the random effects:
corCompSymm
assumes there is compound symmetrycorComp
allows for an unstructured covariance matrix
where the correlations are not assumed to be equalAR1
has the correlations decay over time.I do not know exactly what these differences entail or how to determine which is most appropriate for your data set. But I mention them in case folks are interested in following up on this on their own to learn more (it is on my to-learn list in the near future). gls() also works with simple linear regression (you remove the correlation matrix), which is nice.
A brief aside, both lmer() and lm() models assume that the variance
is the same between each level of your fixed effect. This might not be
the case, however, especially if you are comparing children who are
typically developing (who usually have less variability in their
performance) to children with atypical development (where there is
usually a wider range in performance across children within the group).
With gls() you can modify the model so that it does not assume
equal variance between levels of your fixed effect by adding
weights=nlme::varIdent(form=~1|Dx)
or in our model if there
was more variability in post-switch compared to pre-switch performance
(which is not the case based on our earlier plots) we could add
weights=nlme::varIdent(form=~1|Condition)
then you simply
compare this new model which assumes heterogeneous variance to the
original gls() model that assumed homogeneous variance and can determine
whether one is a better fit by determining whether the AIC value for one
is lower (i.e., more negative) than the other (usually by at least 2
points).
To simplify things, we will just focus on using the lme4
package and the kenward-rogers approximation for linear mixed effects
modeling moving forward.
We again need to update our Condition column to be a factor, since we are using a new dataframe:
# bySub$Condition = factor(bySub$Condition,c('pre-switch','post-switch'))
contrasts(bySub$Condition) = c(0,1)
colnames(attr(bySub$Condition,"contrasts")) = ':pre'
Let’s fit the model with just the random intercept:
m = lmer(Accuracy-.5 ~ Condition + (1|Sub.Num),data=bySub)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Accuracy - 0.5 ~ Condition + (1 | Sub.Num)
## Data: bySub
##
## REML criterion at convergence: -222.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.76374 -0.62793 0.06122 0.76353 2.03449
##
## Random effects:
## Groups Name Variance Std.Dev.
## Sub.Num (Intercept) 0.000000 0.00000
## Residual 0.007218 0.08496
## Number of obs: 112, groups: Sub.Num, 56
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.37249 0.01135 32.809
## Condition:pre -0.04534 0.01606 -2.824
##
## Correlation of Fixed Effects:
## (Intr)
## Conditin:pr -0.707
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular
Notice that our random intercept is again accounting for 0 variance
and we again get the warning:
boundary (singular) fit: see ?isSingular
.
Anova(m,type=3,test='F')
## Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)
##
## Response: Accuracy - 0.5
## F Df Df.res Pr(>F)
## (Intercept) 1076.4034 1 110 < 0.00000000000000022 ***
## Condition 7.9747 1 55 0.006594 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The pattern of results is broadly similar to what we observed before:
There was a significant effect of condition on children’s accuracy, b = -0.05, F(1,55)=7.97, p < .01.
When analyzing the data at the subject level, however, we cannot include the full random effects structure, because we only have 2 data points per participant (1 accuracy averaging across the pre-switch trials and 1 accuracy averaging across the post-switch trials). Including the random slope for condition would overfit the model, including just as many random effects as we have observations. R will generate an error message if you try to fit this model (which prevents me from knitting the final markdown output)
# m = lmer(Accuracy-.5 ~ Condition + (1+Condition|Sub.Num),data=bySub)
In order to use simple linear regression, we can only have 1 data point per participant. We can finagle this by creating a new dependent variable by subtracting children’s average accuracy in the post-switch condition from their average accuracy in the pre-switch condition. This difference score then becomes our DV and the intercept in the model will tell us whether this difference score is significantly greater than 0 (i.e., whether there’s a significant effect of condition).
In order to do this, we need to use a new tidyr
feature:
pivot_wider()
will convert the data frame from long format (multiple rows per subject, one for each condition) into wide format (one row per subject, multiple columns for each condition)
we need to first remove the Trials column usingselect()
because the differing number of trials for each condition will prevent a successful pivot
we then specify that we will use the values from theCondition
column in our original data frame to label the new columns in our new data frame and that the values for thse new colums will come from theAccuracy
column in our original data frame
bySubWide = bySub %>%
select(-Trials) %>%
pivot_wider(names_from=Condition,values_from=Accuracy)
Here’s what our new data frame looks like
Sub.Num | post-switch | pre-switch |
---|---|---|
501 | 0.73 | 0.95 |
502 | 0.80 | 0.87 |
503 | 0.64 | 0.88 |
504 | 0.71 | 0.87 |
505 | 0.95 | 0.94 |
506 | 0.89 | 0.87 |
Finally, we just use mutate
to calculate the difference
score between these columns
bySubWide = bySubWide %>% mutate(diff = `post-switch` - `pre-switch`)
And now we can fit our linear regression model:
m = lm(diff ~ 1, data =bySubWide)
summary(m)
##
## Call:
## lm(formula = diff ~ 1, data = bySubWide)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.27341 -0.06768 -0.00319 0.06931 0.35489
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.04534 0.01615 -2.807 0.0069 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1209 on 55 degrees of freedom
I use the lm()
syntax because we can make the model more
complicated by adding in other fixed effects (e.g., DCCS
)
which would essentially test whether there is a significant interaction
between our effect of condition and that fixed effect.
If we aren’t including any other fixed effects, however, our
lm()
is essentially just a t-test:
t.test(bySubWide$`pre-switch`, bySubWide$`post-switch`, paired = TRUE)
##
## Paired t-test
##
## data: bySubWide$`pre-switch` and bySubWide$`post-switch`
## t = 2.8071, df = 55, p-value = 0.006903
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.01297110 0.07771226
## sample estimates:
## mean of the differences
## 0.04534168
t.test(bySubWide$diff)
##
## One Sample t-test
##
## data: bySubWide$diff
## t = -2.8071, df = 55, p-value = 0.006903
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.07771226 -0.01297110
## sample estimates:
## mean of x
## -0.04534168