This script uses the following packages:
# library(ggplot2)
# library(tidyr)
library(purrr)
library(foreach)
library(doSNOW)
## 'data.frame': 173690 obs. of 11 variables:
## $ Sub.Num : int 501 501 501 501 501 501 501 501 501 501 ...
## $ Tr.Num : int 2 2 2 2 2 2 2 2 2 2 ...
## $ Order : chr "CN-1A" "CN-1A" "CN-1A" "CN-1A" ...
## $ Condition : chr "pre-switch" "pre-switch" "pre-switch" "pre-switch" ...
## $ Target : chr "slide" "slide" "slide" "slide" ...
## $ Target.Side : chr "l" "l" "l" "l" ...
## $ TimeC : int -3067 -3033 -3000 -2967 -2933 -2900 -2867 -2833 -2800 -2767 ...
## $ Accuracy : num NA NA NA NA NA NA NA NA NA NA ...
## $ Block : chr "1A" "1A" "1A" "1A" ...
## $ percentMissingFrames: num 10 10 10 10 10 10 10 10 10 10 ...
## $ Source : chr "tobii" "tobii" "tobii" "tobii" ...
## Sub.Num Tr.Num Order Condition Target Target.Side TimeC Accuracy Block
## 1 501 2 CN-1A pre-switch slide l -3067 NA 1A
## 2 501 2 CN-1A pre-switch slide l -3033 NA 1A
## 3 501 2 CN-1A pre-switch slide l -3000 NA 1A
## 4 501 2 CN-1A pre-switch slide l -2967 NA 1A
## 5 501 2 CN-1A pre-switch slide l -2933 NA 1A
## 6 501 2 CN-1A pre-switch slide l -2900 NA 1A
## percentMissingFrames Source
## 1 10 tobii
## 2 10 tobii
## 3 10 tobii
## 4 10 tobii
## 5 10 tobii
## 6 10 tobii
In these analyses, we will test for significance using cluster-based permutation analyses. In these analyses, we will fit models (either t-tests or logistic mixed effects) for each time bin throughout the entire critical window (i.e., every 33ms from 300 to 1800 ms after the onset of the critical word).
We must take into account multiple comparisons, however, because if there is in fact no difference and we complete 46 tests, we will find a significant effect on average 2.3 times (i.e., 5% of 46).
Bonferonni corrections are often used, but are too conservative.
An alternative method is to randomly shuffle our data 1,000 times, refitting the models for each time bin each time. For each permutation, we will extract the largest cluster of contiguous significant effects. We can then test the significance of our observed effect by comparing how likely we are to observe an effect of that magnitude (i.e., what percentage of the random permutations exceed the size of our observed effect).
Randomization occurs at the Condition level. An important constraint is that any imbalances that naturally occur in the data are preserved. For instance, there are 7 pre-switch trials (1 excluded due to too much missing data) and 8 post-switch trials for subject 501. Therefore, when randomly assigning trials to be in the pre-switch or post-switch condition in any given permutation for subject 501 we will always maintain this same imbalance.
We’ll start by using t tests for each time bin. These models rapidly fit and can therefore easily be repeated 1,000 times.
In the final section I use logistic mixed effects for each time bin. This approach keeps the data at the individual trial level for each time bin (i.e., 1’s and 0’s). The cost, however, is that these models are more complex and require longer to fit. This difference in time is trivial when fitting just one model, but quickly accumulates when conducting many permutations. A larger concern is that these models may fail to converge because we are drastically reducing the amount of data we have by conducting analyses for each time frame (i.e., a maximum of 16 data points per participant). As we’ll see later, the current data set does not support this use of logistic mixed effects models (due to convergence issues).
Before we start analyzing the data, however, let’s first plot a time
course of the changes in children’s accuracy during our critical window.
Luckily the code is nearly identical to our mean accuracy code, except
we are keeping Time
when aggregating and changing our
x-axis to be Time
rather than Condition
:
bySub = d %>%
group_by(Sub.Num,Condition,TimeC) %>%
filter(TimeC > -2000 & TimeC < 3000) %>%
summarise(
Trials = sum(!is.na(Accuracy)),
Accuracy = mean(Accuracy,na.rm=T))
Sub.Num | Condition | TimeC | Trials | Accuracy |
---|---|---|---|---|
501 | post-switch | 0 | 8 | 0.38 |
501 | post-switch | 33 | 7 | 0.29 |
501 | post-switch | 67 | 7 | 0.29 |
501 | post-switch | 100 | 7 | 0.29 |
501 | post-switch | 133 | 6 | 0.33 |
501 | post-switch | 167 | 7 | 0.29 |
501 | post-switch | 200 | 7 | 0.29 |
501 | post-switch | 233 | 7 | 0.29 |
501 | post-switch | 267 | 7 | 0.43 |
501 | post-switch | 300 | 7 | 0.43 |
501 | post-switch | 333 | 6 | 0.50 |
501 | post-switch | 367 | 6 | 0.50 |
501 | post-switch | 400 | 6 | 0.67 |
501 | post-switch | 433 | 6 | 0.67 |
501 | post-switch | 467 | 7 | 0.71 |
501 | post-switch | 500 | 7 | 0.71 |
501 | post-switch | 533 | 7 | 0.71 |
501 | post-switch | 567 | 6 | 0.83 |
501 | post-switch | 600 | 8 | 0.75 |
501 | post-switch | 633 | 8 | 0.75 |
byGroup = bySub %>%
group_by(Condition,TimeC) %>%
summarise(
Subjects = sum(!is.na(Accuracy)),
SD = sd(Accuracy,na.rm=TRUE),
SE = SD/sqrt(Subjects),
Accuracy = mean(Accuracy,na.rm=TRUE),
lower=Accuracy-SE,
upper=Accuracy+SE)
byGroup$Condition = factor(byGroup$Condition,c("pre-switch","post-switch"))
Condition | TimeC | Subjects | SD | SE | Accuracy | lower | upper |
---|---|---|---|---|---|---|---|
post-switch | 0 | 55 | 0.21 | 0.03 | 0.56 | 0.53 | 0.59 |
post-switch | 33 | 55 | 0.22 | 0.03 | 0.56 | 0.53 | 0.59 |
post-switch | 67 | 55 | 0.23 | 0.03 | 0.58 | 0.55 | 0.61 |
post-switch | 100 | 54 | 0.21 | 0.03 | 0.56 | 0.54 | 0.59 |
post-switch | 133 | 55 | 0.23 | 0.03 | 0.58 | 0.55 | 0.61 |
post-switch | 167 | 56 | 0.24 | 0.03 | 0.56 | 0.53 | 0.59 |
post-switch | 200 | 55 | 0.22 | 0.03 | 0.58 | 0.55 | 0.61 |
post-switch | 233 | 55 | 0.23 | 0.03 | 0.58 | 0.55 | 0.61 |
post-switch | 267 | 55 | 0.23 | 0.03 | 0.59 | 0.56 | 0.62 |
post-switch | 300 | 55 | 0.22 | 0.03 | 0.58 | 0.55 | 0.61 |
post-switch | 333 | 55 | 0.21 | 0.03 | 0.59 | 0.56 | 0.62 |
post-switch | 367 | 55 | 0.21 | 0.03 | 0.60 | 0.58 | 0.63 |
post-switch | 400 | 55 | 0.22 | 0.03 | 0.63 | 0.60 | 0.66 |
post-switch | 433 | 55 | 0.23 | 0.03 | 0.66 | 0.63 | 0.69 |
post-switch | 467 | 55 | 0.23 | 0.03 | 0.68 | 0.65 | 0.71 |
post-switch | 500 | 55 | 0.22 | 0.03 | 0.69 | 0.66 | 0.72 |
post-switch | 533 | 55 | 0.21 | 0.03 | 0.70 | 0.67 | 0.73 |
post-switch | 567 | 55 | 0.20 | 0.03 | 0.71 | 0.68 | 0.74 |
post-switch | 600 | 55 | 0.18 | 0.02 | 0.73 | 0.70 | 0.75 |
post-switch | 633 | 55 | 0.17 | 0.02 | 0.74 | 0.72 | 0.76 |
ggplot(byGroup, aes(x=TimeC, y=Accuracy, fill=Condition, color=Condition)) +
geom_hline(aes(yintercept=0.5),linetype='solid',color='gray') +
geom_line()+
geom_smooth(aes(ymin=lower, ymax=upper), stat="identity") +
geom_vline(aes(xintercept=0), linetype="dashed", color="gray") +
theme_bw(base_size=10) +
coord_cartesian(xlim=c(-2000,3000), ylim= c(0,1.01),expand=F) +
scale_x_continuous(breaks=seq(from=-2100,to=3000,by=300)) +
scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
scale_fill_manual(values=c("dodgerblue","coral2")) +
scale_color_manual(values=c("dodgerblue","coral2")) +
labs(x='Time since target onset (in ms)',y='Proportion Looking to Target') +
theme(legend.justification=c(1,0),legend.position=c(1,0),legend.background=element_rect(fill= NA, color=NA),legend.title=element_blank(),legend.text = element_text(size = 11))
Let’s first start off by conducting the 46 t-tests, one for each time frame during our critical window (300, 333, 367, 400, 433, 467, 500, 533, 567, 600, 633, 667, 700, 733, 767, 800, 833, 867, 900, 933, 967, 1000, 1033, 1067, 1100, 1133, 1167, 1200, 1233, 1267, 1300, 1333, 1367, 1400, 1433, 1467, 1500, 1533, 1567, 1600, 1633, 1667, 1700, 1733, 1767, 1800).
Technically we can conduct paired sample t-tests on the
bySub
data frame which is in long format. But, this can
cause problems with some data sets - including the current data set.
Notice how the number of subjects who contributed to the average
accuracy for each time frame in the byGroup
data frame
changes over time. This is because some participants were not looking at
either the target or distractor image for that exact time frame (e.g.,
300ms) on all 8 trials. When all of a subject’s accuracies have values
of NA for that time frame in that condition their data is dropped from
the analysis. This causes a problem, however, if that subject had
accuracies for that time frame in the other condition. A paired
samples t-test won’t work if there’s data without its pair. So, we need
to filter our data set to remove data for time frames where a subject
only has data for one condition, but not the other.
The easiest way to do this is to pivot or data frame into wide format like before:
bySubWide = bySub %>%
select(-Trials) %>%
pivot_wider(names_from=Condition,values_from=Accuracy)
And then to modify each column to remove data for a time frame where data is missing in the other column
bySubWide = bySubWide %>%
mutate(
pre = ifelse(is.na(`post-switch`),NA,`pre-switch`),
post = ifelse(is.na(`pre-switch`),NA,`post-switch`)
)
Notice now how the original column pre-switch
had
accuracy values that are dropped (converted to NAs) for those rows where
we are missing data in the post-switch
column:
Sub.Num | TimeC | post-switch | pre-switch | pre | post |
---|---|---|---|---|---|
505 | 100 | NaN | 0.25 | NA | NaN |
539 | -1967 | NaN | 0.33 | NA | NaN |
539 | -1933 | NaN | 0.33 | NA | NaN |
542 | -1967 | NaN | 1.00 | NA | NaN |
542 | -1633 | NaN | 1.00 | NA | NaN |
542 | -1600 | NaN | 1.00 | NA | NaN |
Now we can conduct the paired-sample t-tests on all 46 frames in our critical window.
note on syntax:
adding
$statistic
to the end of thet.test()
will return just the t-value from the t-test (we don’t need all the other information)
ts <- bySubWide %>%
filter(TimeC >=300 & TimeC<=1800) %>%
group_by(TimeC) %>%
summarise(vars = t.test(post, pre, paired = TRUE)$statistic) %>%
mutate(Sig = ifelse(abs(vars)>2,"yes","no"))
TimeC | vars | Sig |
---|---|---|
300 | 0.63 | no |
333 | 0.35 | no |
367 | -0.82 | no |
400 | -0.60 | no |
433 | -0.66 | no |
467 | -0.42 | no |
500 | -1.03 | no |
533 | -2.01 | yes |
567 | -2.04 | yes |
600 | -2.83 | yes |
633 | -2.87 | yes |
667 | -2.97 | yes |
700 | -2.70 | yes |
733 | -2.65 | yes |
767 | -2.44 | yes |
800 | -2.13 | yes |
833 | -2.16 | yes |
867 | -1.72 | no |
900 | -1.42 | no |
933 | -1.12 | no |
ggplot(data=ts,aes(x=TimeC,y=vars,color=Sig)) +
geom_point()+
theme_bw() +
scale_color_manual(values=c("black","red")) +
geom_hline(yintercept = 2) +
geom_hline(yintercept = -2) +
scale_x_continuous(breaks=seq(from=300,to=1800,by=300)) +
labs(x='Time since target onset (in ms)',y='t value',title='Pre-switch vs. Post-switch')
For analysis purposes, we need a function to help us extract information about the largest cluster of consecutive significant t-tests.
get_max_cluster_window
identifies when each cluster
starts, stops, and the sum of all the t values within each cluster
(i.e., area under the curve = auc), then returns only the information
for the cluster with the largest auc value.
note on syntax:
rleid()
from thedata.table
package generates run-length data, which creates a new column of numbers that start with 1 and increment every time there is a change in value for theSig
column.
for example, ifts$Sig
had a value ofns, ns, pos, pos, ns
thendata.table::rleid(ts$Sig)
would return1, 1, 2, 2, 3
which.max(abs(d_clust$auc)
selects only the row that corresponds to the cluster with the largest auc value
get_max_cluster_window <- function(m, threshold = 2){
colnames(m) <- c("time", "stat")
d_clust <- m %>%
mutate(
sig = ifelse(stat >= threshold, "pos",
ifelse(stat <= -threshold, "neg", "ns")),
cluster = data.table::rleid(sig)
) %>%
filter(sig != "ns") %>%
group_by(cluster) %>%
summarise(
start = min(time),
stop = max(time),
auc = sum(stat))
max_cluster = data.frame(cluster=0,start=0,stop=0,auc=0)
if(nrow(d_clust) > 0) {max_cluster = d_clust[which.max(abs(d_clust$auc)),]}
return(max_cluster)
}
max_cluster = get_max_cluster_window(ts %>% select(TimeC,vars))
Here’s what get_max_cluster_window()
returns when we
give it our dataframe of 46 t tests:
cluster | start | stop | auc |
---|---|---|---|
2 | 533 | 833 | -24.8 |
It has the Time when the cluster started 533
, when it
ends 833
, and the sum of the 10 consecutive t tests.
Lastly, let’s use this information to re-plot the time course with a horizontal line marking the significant effect of condition:
We still need to determine whether this cluster of t values is statistically significant. To do this, we will need to determine whether we would expect to find a cluster of this size less than 5% of the time when we conduct the same analyses on randomized data.
We need one more function to randomize the assignment of trials to condition. This function pseudo-randomly assigns each trial to be in either the pre-switch or the post-switch condition. With our two conditions, there are four possible scenarios that occur when randomizing:
For any individual randomization, we may end up with a randomized condition that maintains most of the correct assignments (i.e., trials are randomized to be in the original condition). Other times we may end up with the opposite - a randomized condition that flips most of the correct assignments (i.e., trials are randomized to be in the “wrong” condition). But across many randomizations (we’ll be doing 1,000) this will wash out so that, on average, any given trial will be assigned equally often in its “correct” vs. “wrong” condition following randomization. Moreover, for the average randomization, half of the trials will be correctly assigned and half will be incorrectly assigned. What this means is that the advantage we observe in our pre-switch trials should disappear when condition is randomized (since, on average, half of the trials in the randomized “pre-switch” condition will actually be post-switch trials).
I say that this process is “pseudo-randomized”, because we have the constraint that any imbalances in the number of trials per condition that naturally occurred for each subject are preserved. Recall that there are 7 pre-switch trials (1 excluded due to too much missing data) and 8 post-switch trials for subject 501. Therefore, when randomly assigning trials to be in the pre-switch or post-switch condition in any given permutation for subject 501 we will always maintain this same imbalance.
This function involves a fair bit of code, so I have embedded the syntax notes within the code:
shuffConditions = function(d,seed) {
set.seed(seed) # set the random seed, which allows us to re-create the same randomization when re-running our analyses
# create dataframe with one row per trial for each participant
randCond = d %>%
filter(percentMissingFrames < 50) %>%
select(Sub.Num,Order,Tr.Num,Condition) %>%
distinct()
# determine number of trials in the pre-switch condition for each participant
numPreTr = randCond %>%
filter(Condition=='pre-switch') %>%
group_by(Sub.Num) %>%
summarise(n=length(Tr.Num))
# for each participant randomly select (w/o replacement) trials to match the number of trials in the pre-switch condition
# assign these trials to be pre-switch in the randomized condition (CondR)
# this randomized condition (CondR) will contain some mix of trials that are in fact from the pre-switch condition
# and some trials that are from the post-switch condition
new = randCond %>%
group_by(Sub.Num) %>%
nest() %>%
left_join(numPreTr) %>%
mutate(v = map2(data, n, ~sample_n(.x, .y))) %>%
unnest(v) %>%
select(-data,-n) %>%
arrange(Sub.Num,Order,Tr.Num) %>%
mutate(CondR='pre-switch')
# merge the random selection with the full list of trials
# assign trials without a randomized condition (CondR = NA) to be post-switch in the randomized condition (CondR)
randCond = left_join(randCond,new)
randCond$CondR[is.na(randCond$CondR)]='post-switch'
d = left_join(d,randCond)
}
note on syntax:
one element that I would like to highlight is the use ofnest()
and thenmap2()
usingnest()
we have made a data frame with 1 row per participant and the columndata
is actually a dataframe with the order, trial number, and condition for each trial that participant completed
Sub.Num | data | n |
---|---|---|
501 | CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , 2 , 3 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch | 7 |
502 | CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , 2 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch | 6 |
503 | CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , 2 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch | 7 |
then, using
map2()
we have randomly sampled without replacement n number of rows (from the columnn
) from that data frame for each participant. this new data frame is saved as a new columnv
these are the trials that we are going to randomly assign to be in the ‘pre-switch’ condition.
Sub.Num | data | n | v |
---|---|---|---|
501 | CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , 2 , 3 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch | 7 | CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , 14 , 2 , 8 , 4 , 9 , 12 , 13 , post-switch, pre-switch , pre-switch , pre-switch , pre-switch , post-switch, post-switch |
502 | CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , 2 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch | 6 | CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , CN-1B , 10 , 8 , 9 , 14 , 5 , 13 , post-switch, pre-switch , pre-switch , post-switch, pre-switch , post-switch |
503 | CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , 2 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch | 7 | CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , CN-2A , 14 , 10 , 5 , 17 , 2 , 16 , 13 , post-switch, post-switch, pre-switch , post-switch, pre-switch , post-switch, post-switch |
we use
unnest()
to re-convert the dataframes in columnv
into multiple rows. so there is no longer just one row of data per subject:
Sub.Num | data | n | Order | Tr.Num | Condition |
---|---|---|---|---|---|
501 | CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , 2 , 3 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch | 7 | CN-1A | 14 | post-switch |
501 | CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , 2 , 3 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch | 7 | CN-1A | 2 | pre-switch |
501 | CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , CN-1A , 2 , 3 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , pre-switch , post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch, post-switch | 7 | CN-1A | 8 | pre-switch |
finally, remove the column
data
that contained the nested data frame, which leaves us with a dataframe that only contains the trials that were randomly assigned to be in the pre-switch condition:
Sub.Num | Order | Tr.Num | Condition | CondR |
---|---|---|---|---|
501 | CN-1A | 2 | pre-switch | pre-switch |
501 | CN-1A | 4 | pre-switch | pre-switch |
501 | CN-1A | 8 | pre-switch | pre-switch |
501 | CN-1A | 9 | pre-switch | pre-switch |
501 | CN-1A | 12 | post-switch | pre-switch |
501 | CN-1A | 13 | post-switch | pre-switch |
501 | CN-1A | 14 | post-switch | pre-switch |
502 | CN-1B | 5 | pre-switch | pre-switch |
502 | CN-1B | 8 | pre-switch | pre-switch |
502 | CN-1B | 9 | pre-switch | pre-switch |
502 | CN-1B | 10 | post-switch | pre-switch |
502 | CN-1B | 13 | post-switch | pre-switch |
502 | CN-1B | 14 | post-switch | pre-switch |
503 | CN-2A | 2 | pre-switch | pre-switch |
503 | CN-2A | 5 | pre-switch | pre-switch |
503 | CN-2A | 10 | post-switch | pre-switch |
503 | CN-2A | 13 | post-switch | pre-switch |
503 | CN-2A | 14 | post-switch | pre-switch |
503 | CN-2A | 16 | post-switch | pre-switch |
503 | CN-2A | 17 | post-switch | pre-switch |
Now we can repeat our previous steps in aggregating the data, but
adding in our new shuffConditions()
function, which is
embedded in the tidyr pipeline. We are passing in the filtered data
frame, which is the first required argument for this function. And we
are manually passing in a value of 1 for the random seed (when using a
for loop in the next section we will instead be passing in the
i
that increments from 1 to 1,000 during each iteration
through the for loop).
bySub = d %>%
filter(TimeC > -2000 & TimeC < 3000) %>%
shuffConditions(1) %>%
group_by(Sub.Num,CondR,TimeC) %>%
summarise(Accuracy = mean(Accuracy,na.rm=T))
byGroup = bySub %>%
group_by(CondR,TimeC) %>%
summarise(
Subjects = sum(!is.na(Accuracy)),
SD = sd(Accuracy,na.rm=TRUE),
SE = SD/sqrt(Subjects),
Accuracy = mean(Accuracy,na.rm=TRUE),
lower=Accuracy-SE,
upper=Accuracy+SE)
byGroup$CondR = factor(byGroup$CondR, c("pre-switch","post-switch"))
bySubWide = bySub %>%
pivot_wider(names_from=CondR,values_from=Accuracy) %>%
mutate(
pre = ifelse(is.na(`post-switch`),NA,`pre-switch`),
post = ifelse(is.na(`pre-switch`),NA,`post-switch`)
)
ts <- bySubWide %>%
filter(TimeC >=300 & TimeC<=1800) %>%
group_by(TimeC) %>%
summarise(vars = t.test(post, pre, paired = TRUE)$statistic) %>%
mutate(Sig = ifelse(abs(vars)>2,"yes","no"))
ggplot(data=ts,aes(x=TimeC,y=vars,color=Sig)) +
geom_point()+
theme_bw() +
scale_color_manual(values=c("black","red")) +
geom_hline(yintercept = 2) +
geom_hline(yintercept = -2) +
scale_x_continuous(breaks=seq(from=300,to=1800,by=300)) +
labs(x='Time since target onset (in ms)',y='t-test (Coartic vs. Neutral)',title='Condition Effect (Shuffled)')
In this randomization there is not a single time frame during which the t test revealed a significant difference between our randomized pre-switch and post-switch conditions. As we’ll see in the next section, however, this will not always be the case (some of our randomizations will yield clusters of significant t tests).
Great, now we just need to repeat this process 1,000 times. You may
have noticed earlier that I leveraged the map2()
function
from the purrr
package to complete a process over multiple
inputs simultaneously. This parallel processing speeds things up.
We’ll want to use a similar logic here. Rather than running 46 t tests 1,000 times in order (i.e., one after the other), we can run several iterations of the 46 t tests simultaneously. How many we can run just depends on how many cores your computer has. My computer has 7 nodes. So we can run the code for multiple randomizations in parallel.
c1 <- makeCluster(parallel::detectCores()-1)
registerDoSNOW(c1)
c1
## socket cluster with 7 nodes on host 'localhost'
note on syntax:
we’re using similar tidyr code as before, but embedding it within a
foreach(){}
loop that uses the%dopar%
function from thefor_each
package to complete multiple iterations of our tidy r code in parallel; essentially running the first 7 iterations through our for loop (i.e., i = 1 through 7) at the same time.the first part of the code uses
txtProgressBar()
to help us graphically track the progress of the 1,000 randomizations
# nrep=1000
# pb <- txtProgressBar(max=nrep, style=3)
# progress <- function(n) setTxtProgressBar(pb, n)
# opts <- list(progress=progress)
#
# ts_randCond = foreach (i = 1:nrep, .combine = c, .options.snow = opts, .packages = c("tidyverse","purrr")) %dopar% {
#
# bySub = d %>%
# filter(TimeC > -2000 & TimeC < 3000) %>%
# shuffConditions(i) %>%
# group_by(Sub.Num,CondR,TimeC) %>%
# summarise(Accuracy = mean(Accuracy,na.rm=T))
#
# bySubWide = bySub %>%
# pivot_wider(names_from=CondR,values_from=Accuracy) %>%
# mutate(
# pre = ifelse(is.na(`post-switch`),NA,`pre-switch`),
# post = ifelse(is.na(`pre-switch`),NA,`post-switch`))
#
# ts <- bySubWide %>%
# filter(TimeC >=300 & TimeC<=1800) %>%
# group_by(TimeC) %>%
# summarise(vars = t.test(post, pre, paired = TRUE)$statistic) %>%
# mutate(Sig = ifelse(abs(vars)>2,"yes","no"))
#
# ts = ts %>% select(TimeC,vars)
#
# max_cluster_window = get_max_cluster_window(ts)
#
# return(max_cluster_window$auc)
#
# }
The output from this is a list with 1,000 values. Each value is the area under the curve (auc) for the largest cluster of significant t values for one randomization:
x |
---|
0.00 |
-5.10 |
2.14 |
-4.18 |
0.00 |
0.00 |
0.00 |
-7.66 |
-24.32 |
0.00 |
This code takes my computer somewhere between 1 to 2 minutes to complete. To speed up the process of knitting the markdown file, let’s save the output as an .RData file to reload in future iterations and comment out the original code:
# save(ts_randCond,file='ts_randCond.Rdata')
# load('ts_randCond.Rdata')
Now we can conduct our test of significance by determining how many times the auc value of the largest cluster of significant t tests in our actual data exceeds the maximum auc values of the 1,000 randomizations.
p_obs <- mean(abs(ts_randCond) > abs(max_cluster$auc))
t_crit_pos <- quantile(ts_randCond, probs = 0.975)
t_crit_neg <- quantile(ts_randCond, probs = 0.025)
ggplot(mapping = aes(x = ts_randCond)) +
geom_density(fill = "green", alpha = .5) +
geom_vline(xintercept = t_crit_pos) +
geom_vline(xintercept = t_crit_neg) +
geom_vline(xintercept = max_cluster$auc, colour = "blue") +
labs(
title = "Condition Effect, 300-1800 ms",
x = "Largest Cluster (1000 Permutations)"
)
With the raw data, children’s accuracy in fixating the target object was higher for pre-switch compared to post-switch trials during a window from 533ms to 833ms after the onset of the critical word. The size of this effect (i.e., area under the curve) is -24.8.
When reshuffling the data (randomly re-assigning trials for a child to be in the pre- vs. post-switch condition), we observe an effect of this size 3.3% of the time.
Therefore, we can conclude that the difference we observe between the trials before (pre-switch) and after (post-switch) the change in dimensions is statistically significant.
We now have all of the tools to repeat this process using a more
complicated model: logistic mixed effects modeling. We’re going to be
fitting 46 of these models one for each time frame. We will keep the
data at the individual trial level - there will be no aggregating so
we’ll be dealing with accuracies of 1’s or 0’s (e.g., was the
participant looking at the target or the distractor image at time 300ms
on trial 2). Hence the need to use logistic regression. We’re going to
use the glmer()
function from the lme4
package.
We just need to make a few modifications to our functions to extract information from these models rather than t-tests (extracting z, rather than t values).
extract_glmer_z <- function(this_lm){
this_lm = as.data.frame(coef(summary(this_lm)))
lm_table <- this_lm %>%
mutate(term = rownames(this_lm)) %>%
select(term, `z value`) %>%
rename(z = `z value`) %>%
filter(term != "(Intercept)")
return(lm_table)
}
get_max_cluster_glmer <- function(m, threshold = 2){
colnames(m) <- c("time","term", "stat")
m = as.data.frame(m)
# Calculate all cluster sizes
d_clust <- m %>%
mutate(
sig = ifelse(stat >= threshold, "pos",
ifelse(stat <= -threshold, "neg", "ns")),
cluster = data.table::rleid(sig)
) %>%
filter(sig != "ns") %>%
group_by(cluster) %>%
summarise(
start = min(time),
stop = max(time),
auc = sum(stat))
max_cluster = data.frame(cluster=0,start=0,stop=0,auc=0)
if(nrow(d_clust) > 0) {max_cluster = d_clust[which.max(abs(d_clust$auc)),]}
return(max_cluster)
}
Here’s a peak at what the data looks like:
bySub = d %>%
filter(TimeC>=300 & TimeC <=1800 & percentMissingFrames < 50) %>%
select(Sub.Num,Order,Tr.Num,Condition,TimeC,Accuracy)
bySub$Condition <- factor(bySub$Condition, c('pre-switch','post-switch'))
contrasts(bySub$Condition) = c(-.5,.5)
colnames(attr(bySub$Condition,"contrasts")) = ""
Sub.Num | Order | Tr.Num | Condition | TimeC | Accuracy |
---|---|---|---|---|---|
501 | CN-1A | 2 | pre-switch | 300 | 0 |
501 | CN-1A | 2 | pre-switch | 333 | NA |
501 | CN-1A | 2 | pre-switch | 367 | NA |
501 | CN-1A | 2 | pre-switch | 400 | NA |
501 | CN-1A | 2 | pre-switch | 433 | NA |
501 | CN-1A | 2 | pre-switch | 467 | 1 |
501 | CN-1A | 2 | pre-switch | 500 | 1 |
501 | CN-1A | 2 | pre-switch | 533 | 1 |
501 | CN-1A | 2 | pre-switch | 567 | 1 |
501 | CN-1A | 2 | pre-switch | 600 | 1 |
501 | CN-1A | 2 | pre-switch | 633 | 1 |
501 | CN-1A | 2 | pre-switch | 667 | 1 |
501 | CN-1A | 2 | pre-switch | 700 | 1 |
501 | CN-1A | 2 | pre-switch | 733 | 1 |
501 | CN-1A | 2 | pre-switch | 767 | 1 |
501 | CN-1A | 2 | pre-switch | 800 | 1 |
501 | CN-1A | 2 | pre-switch | 833 | 1 |
501 | CN-1A | 2 | pre-switch | 867 | 1 |
501 | CN-1A | 2 | pre-switch | 900 | 1 |
501 | CN-1A | 2 | pre-switch | 933 | 1 |
Sub.Num | Order | Tr.Num | Condition | TimeC | Accuracy | |
---|---|---|---|---|---|---|
1 | 501 | CN-1A | 2 | pre-switch | 300 | 0 |
47 | 501 | CN-1A | 3 | pre-switch | 300 | 1 |
93 | 501 | CN-1A | 4 | pre-switch | 300 | 1 |
139 | 501 | CN-1A | 6 | pre-switch | 300 | 1 |
185 | 501 | CN-1A | 7 | pre-switch | 300 | 0 |
231 | 501 | CN-1A | 8 | pre-switch | 300 | 0 |
277 | 501 | CN-1A | 9 | pre-switch | 300 | 1 |
323 | 501 | CN-1A | 10 | post-switch | 300 | 1 |
369 | 501 | CN-1A | 11 | post-switch | 300 | 0 |
415 | 501 | CN-1A | 12 | post-switch | 300 | 1 |
461 | 501 | CN-1A | 13 | post-switch | 300 | 0 |
507 | 501 | CN-1A | 14 | post-switch | 300 | 1 |
553 | 501 | CN-1A | 15 | post-switch | 300 | NA |
599 | 501 | CN-1A | 16 | post-switch | 300 | 0 |
645 | 501 | CN-1A | 17 | post-switch | 300 | 0 |
691 | 502 | CN-1B | 2 | pre-switch | 300 | 0 |
737 | 502 | CN-1B | 5 | pre-switch | 300 | 1 |
783 | 502 | CN-1B | 6 | pre-switch | 300 | 0 |
829 | 502 | CN-1B | 7 | pre-switch | 300 | 1 |
875 | 502 | CN-1B | 8 | pre-switch | 300 | 0 |
And here’s the syntax for repeating the glmer() model 46 times (1 for each time frame during our critical window):
zs <- bySub %>%
group_by(TimeC) %>%
group_modify(~extract_glmer_z(glmer(Accuracy ~ Condition + (Condition|Sub.Num), data=.x,family=binomial))) %>%
mutate(Sig = ifelse(abs(z)>2,"yes","no"))
Here’s what the output looks like:
TimeC | term | z | Sig |
---|---|---|---|
300 | Condition | 0.54 | no |
333 | Condition | 0.15 | no |
367 | Condition | -0.72 | no |
400 | Condition | -0.71 | no |
433 | Condition | -0.79 | no |
467 | Condition | -0.75 | no |
500 | Condition | -1.02 | no |
533 | Condition | -2.05 | yes |
567 | Condition | -2.27 | yes |
600 | Condition | -2.99 | yes |
633 | Condition | -3.01 | yes |
667 | Condition | -2.97 | yes |
700 | Condition | -2.89 | yes |
733 | Condition | -2.62 | yes |
767 | Condition | -2.43 | yes |
800 | Condition | -2.35 | yes |
833 | Condition | -251.78 | yes |
867 | Condition | -2.22 | yes |
900 | Condition | -1.58 | no |
933 | Condition | -1.14 | no |
Notice the massive jump in the z value for the 833 ms time frame? I’ve surpressed warnings when knitting the markdown file, but that model was generating the following error:
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| = 0.0280114 (tol = 0.002, component 1)
Errors regarding convergence are serious. The approximation procedure may not have identified the correct parameters to minimize error and therefore the test of significance may not be appropriate.
Let’s check out that model for the frame 800 right before the problematic test:
temp = bySub[bySub$TimeC==800,]
Sub.Num | Order | Tr.Num | Condition | TimeC | Accuracy | |
---|---|---|---|---|---|---|
16 | 501 | CN-1A | 2 | pre-switch | 800 | 1 |
62 | 501 | CN-1A | 3 | pre-switch | 800 | 1 |
108 | 501 | CN-1A | 4 | pre-switch | 800 | 1 |
154 | 501 | CN-1A | 6 | pre-switch | 800 | 1 |
200 | 501 | CN-1A | 7 | pre-switch | 800 | 1 |
246 | 501 | CN-1A | 8 | pre-switch | 800 | 1 |
292 | 501 | CN-1A | 9 | pre-switch | 800 | 1 |
338 | 501 | CN-1A | 10 | post-switch | 800 | 1 |
384 | 501 | CN-1A | 11 | post-switch | 800 | 1 |
430 | 501 | CN-1A | 12 | post-switch | 800 | 1 |
476 | 501 | CN-1A | 13 | post-switch | 800 | 1 |
522 | 501 | CN-1A | 14 | post-switch | 800 | 1 |
568 | 501 | CN-1A | 15 | post-switch | 800 | 0 |
614 | 501 | CN-1A | 16 | post-switch | 800 | 1 |
660 | 501 | CN-1A | 17 | post-switch | 800 | 1 |
706 | 502 | CN-1B | 2 | pre-switch | 800 | 1 |
752 | 502 | CN-1B | 5 | pre-switch | 800 | 1 |
798 | 502 | CN-1B | 6 | pre-switch | 800 | 1 |
844 | 502 | CN-1B | 7 | pre-switch | 800 | 1 |
890 | 502 | CN-1B | 8 | pre-switch | 800 | 1 |
m = glmer(Accuracy ~ Condition + (Condition|Sub.Num), data=temp,family=binomial)
summary(m)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: Accuracy ~ Condition + (Condition | Sub.Num)
## Data: temp
##
## AIC BIC logLik deviance df.resid
## 606.7 629.6 -298.3 596.7 715
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.3904 0.2880 0.3592 0.4536 0.5919
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Sub.Num (Intercept) 0.01079 0.1039
## Condition 1.12654 1.0614 -1.00
## Number of obs: 720, groups: Sub.Num, 56
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.8643 0.1286 14.497 <0.0000000000000002 ***
## Condition -0.7048 0.2997 -2.351 0.0187 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Condition -0.357
And now the test for 833 ms (the problematic one):
temp = bySub[bySub$TimeC==833,]
m = glmer(Accuracy ~ Condition + (Condition|Sub.Num), data=temp,family=binomial)
summary(m)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: Accuracy ~ Condition + (Condition | Sub.Num)
## Data: temp
##
## AIC BIC logLik deviance df.resid
## 602.8 625.8 -296.4 592.8 726
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2745 0.2963 0.3466 0.4578 0.5132
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Sub.Num (Intercept) 0.03061 0.1750
## Condition 0.80916 0.8995 -1.00
## Number of obs: 731, groups: Sub.Num, 56
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.894123 0.002607 726.6 <0.0000000000000002 ***
## Condition -0.677740 0.002692 -251.8 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Condition 0.000
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## Model failed to converge with max|grad| = 0.0280114 (tol = 0.002, component 1)
Let’s refit this model without the random slope for condition:
m = glmer(Accuracy ~ Condition + (1|Sub.Num), data=temp,family=binomial)
summary(m)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: Accuracy ~ Condition + (1 | Sub.Num)
## Data: temp
##
## AIC BIC logLik deviance df.resid
## 600.9 614.6 -297.4 594.9 728
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.8460 0.3514 0.3514 0.4639 0.4639
##
## Random effects:
## Groups Name Variance Std.Dev.
## Sub.Num (Intercept) 0.0000000001301 0.00001141
## Number of obs: 731, groups: Sub.Num, 56
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.8140 0.1081 16.77 <0.0000000000000002 ***
## Condition -0.5558 0.2163 -2.57 0.0102 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Condition -0.201
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular
We are no longer getting a massive z value for the effect condition and its value is more in line with the tests for the other time frames.
This hearkens back to our earlier discussion about how defaulting to always use the maximal random effects structure (including an intercept and then slope for every within-subject effect) may not always be appropriate. I am not quite sure what’s happening here (I need to do some more digging), but I think it’s reflective of our underlying problem that there’s not enough data (as we’ll see in a moment when we start fitting the models to randomized data) and our model fits are precarious (i.e., many will not converge). It’s always important to check the errors and especially model results when you’re getting these types of errors.
Let’s complete the series of 46 glmer models without the random slope:
zs <- bySub %>%
group_by(TimeC) %>%
group_modify(~extract_glmer_z(glmer(Accuracy ~ Condition + (1|Sub.Num), data=.x,family=binomial))) %>%
mutate(Sig = ifelse(abs(z)>2,"yes","no"))
We’re finding reasonable and consistent z values for all time frames (although we did get one (surpressed) convergence warning for our 3rd to last model fit)
TimeC | term | z | Sig |
---|---|---|---|
300 | Condition | 0.51 | no |
333 | Condition | 0.13 | no |
367 | Condition | -0.71 | no |
400 | Condition | -0.79 | no |
433 | Condition | -1.07 | no |
467 | Condition | -0.93 | no |
500 | Condition | -1.25 | no |
533 | Condition | -2.40 | yes |
567 | Condition | -2.80 | yes |
600 | Condition | -3.13 | yes |
633 | Condition | -3.35 | yes |
667 | Condition | -3.38 | yes |
700 | Condition | -3.29 | yes |
733 | Condition | -3.25 | yes |
767 | Condition | -2.92 | yes |
800 | Condition | -2.86 | yes |
833 | Condition | -2.57 | yes |
867 | Condition | -2.32 | yes |
900 | Condition | -1.76 | no |
933 | Condition | -1.40 | no |
967 | Condition | -1.41 | no |
1000 | Condition | -0.93 | no |
1033 | Condition | -1.14 | no |
1067 | Condition | -1.53 | no |
1100 | Condition | -1.63 | no |
1133 | Condition | -1.52 | no |
1167 | Condition | -1.42 | no |
1200 | Condition | -2.05 | yes |
1233 | Condition | -1.68 | no |
1267 | Condition | -1.53 | no |
1300 | Condition | -1.80 | no |
1333 | Condition | -1.85 | no |
1367 | Condition | -2.14 | yes |
1400 | Condition | -2.70 | yes |
1433 | Condition | -2.35 | yes |
1467 | Condition | -1.97 | no |
1500 | Condition | -1.76 | no |
1533 | Condition | -2.01 | yes |
1567 | Condition | -2.21 | yes |
1600 | Condition | -2.11 | yes |
1633 | Condition | -1.41 | no |
1667 | Condition | -1.77 | no |
1700 | Condition | -1.39 | no |
1733 | Condition | -1.19 | no |
1767 | Condition | -1.17 | no |
1800 | Condition | -0.97 | no |
ggplot(data=zs[zs$term=='Condition',],aes(x=TimeC,y=z,color=Sig)) +
geom_point()+
theme_bw() +
scale_color_manual(values=c("black","red")) +
geom_hline(yintercept = 2) +
geom_hline(yintercept = -2) +
scale_x_continuous(breaks=seq(from=300,to=1800,by=300)) +
labs(x='Time since target onset (in ms)',y='z values')
Let’s extract the area under the curve for the largest cluster of significant effects, which is the same as when we used the t tests (with the exception that it extends one frame longer to 867ms)
max_cluster_condition = get_max_cluster_glmer(zs %>% select(-Sig) %>% filter(term=='Condition'))
max_cluster_condition$term = 'Condition'
# max_cluster = bind_rows(max_cluster_condition,max_cluster_group,max_cluster_interaction) %>%
# select(auc,term)
max_cluster = (max_cluster_condition) %>% select(auc,term)
cluster | start | stop | auc | term |
---|---|---|---|---|
2 | 533 | 867 | -32.28 | Condition |
byGroup = d %>%
filter(TimeC>=300 & TimeC <=1800 & percentMissingFrames < 50) %>% # select time window & max missing frames
group_by(Sub.Num, Condition, TimeC) %>% # aggregate over trials in each condition for each subject
summarise(Accuracy=mean(Accuracy,na.rm=T)) %>%
group_by(Condition, TimeC) %>% # aggregate over participants in each group for each condition
summarise(
N = sum(!is.na(Accuracy)),
SD = sd(Accuracy,na.rm=TRUE),
SE = SD/sqrt(N),
Accuracy = mean(Accuracy,na.rm=TRUE),
lower=Accuracy-SE,
upper=Accuracy+SE)
byGroup$Condition = factor(byGroup$Condition, c("pre-switch","post-switch"))
ggplot(byGroup, aes(x=TimeC, y=Accuracy, fill=Condition, color=Condition)) +
geom_hline(aes(yintercept=0.5),linetype='solid',color='gray') +
geom_line()+
geom_segment(x=max_cluster_condition$start,xend=max_cluster_condition$stop,y=.41,yend=.41,linetype='solid',color='black') +
geom_smooth(aes(ymin=lower, ymax=upper), stat="identity") +
geom_vline(aes(xintercept=0), linetype="dashed", color="gray") +
theme_bw(base_size=10) +
coord_cartesian(xlim=c(300,1800), ylim= c(.40,1.01),expand=F) +
scale_x_continuous(breaks=seq(from=-900,to=2400,by=300)) +
scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
scale_fill_manual(values=c("dodgerblue","coral2")) +
scale_color_manual(values=c("dodgerblue","coral2")) +
labs(x='Time since target onset (in ms)',y='Proportion Looking to Target',title='Condition Effect') +
theme(legend.justification=c(1,0), plot.title=element_text(hjust=.5),legend.position=c(1,0),legend.background=element_rect(fill= NA, color=NA),legend.title=element_blank(),legend.text = element_text(size = 11))
bySub = d %>%
filter(TimeC>=300 & TimeC <=1800 & percentMissingFrames < 50) %>%
shuffConditions(3) %>%
select(Sub.Num,Order,Tr.Num,CondR,TimeC,Accuracy)
bySub$CondR <- factor(bySub$CondR, c('pre-switch','post-switch'))
contrasts(bySub$CondR) = c(-.5,.5)
colnames(attr(bySub$CondR,"contrasts")) = ""
zs_rand1 <- bySub %>%
group_by(TimeC) %>%
group_modify(~extract_glmer_z(glmer(Accuracy ~ CondR + (1|Sub.Num), data=.x,family=binomial))) %>%
mutate(Sig = ifelse(abs(z)>2,"yes","no"))
max_cluster_rand = get_max_cluster_glmer(zs_rand1 %>% select(-Sig) %>% filter(term=='CondR'))
max_cluster_rand$term = 'CondR'
TimeC | term | z | Sig |
---|---|---|---|
300 | CondR | -0.73 | no |
333 | CondR | -0.83 | no |
367 | CondR | -0.59 | no |
400 | CondR | -0.94 | no |
433 | CondR | -1.07 | no |
467 | CondR | -0.90 | no |
500 | CondR | -0.99 | no |
533 | CondR | -0.75 | no |
567 | CondR | -1.45 | no |
600 | CondR | -1.92 | no |
633 | CondR | -1.79 | no |
667 | CondR | -1.51 | no |
700 | CondR | -1.55 | no |
733 | CondR | -1.00 | no |
767 | CondR | -1.05 | no |
800 | CondR | -0.33 | no |
833 | CondR | -0.60 | no |
867 | CondR | -0.81 | no |
900 | CondR | -0.69 | no |
933 | CondR | -0.21 | no |
967 | CondR | -0.01 | no |
1000 | CondR | -0.03 | no |
1033 | CondR | -0.18 | no |
1067 | CondR | 0.51 | no |
1100 | CondR | 0.72 | no |
1133 | CondR | 0.66 | no |
1167 | CondR | 1.03 | no |
1200 | CondR | 1.52 | no |
1233 | CondR | 1.69 | no |
1267 | CondR | 1.53 | no |
1300 | CondR | 138.31 | yes |
1333 | CondR | 1.72 | no |
1367 | CondR | 1.53 | no |
1400 | CondR | 1.57 | no |
1433 | CondR | 1.48 | no |
1467 | CondR | 1.66 | no |
1500 | CondR | 1.55 | no |
1533 | CondR | 1.72 | no |
1567 | CondR | 1.87 | no |
1600 | CondR | 1.92 | no |
1633 | CondR | 2.26 | yes |
1667 | CondR | 2.06 | yes |
1700 | CondR | 2.02 | yes |
1733 | CondR | 1.23 | no |
1767 | CondR | 0.64 | no |
1800 | CondR | 0.74 | no |
ggplot(data=zs_rand1[zs_rand1$term=='CondR',],aes(x=TimeC,y=z,color=Sig)) +
geom_point()+
theme_bw() +
scale_color_manual(values=c("black","red")) +
geom_hline(yintercept = 2) +
geom_hline(yintercept = -2) +
scale_x_continuous(breaks=seq(from=300,to=1800,by=300)) +
labs(x='Time since target onset (in ms)',y='z values')
One of our glmer() model fits returned a massive significant effect:
## # A tibble: 1 × 5
## cluster start stop auc term
## <int> <int> <int> <dbl> <chr>
## 1 2 1300 1300 138. CondR
This is a large effect size for a time frame (1300) in our data where there is not a large difference between our randomized conditions:
byGroup = bySub %>%
group_by(CondR, TimeC) %>% # aggregate over participants in each group for each condition
summarise(
N = sum(!is.na(Accuracy)),
SD = sd(Accuracy,na.rm=TRUE),
SE = SD/sqrt(N),
Accuracy = mean(Accuracy,na.rm=TRUE),
lower=Accuracy-SE,
upper=Accuracy+SE)
byGroup$CondR = factor(byGroup$CondR, c("pre-switch","post-switch"))
ggplot(byGroup, aes(x=TimeC, y=Accuracy, fill=CondR, color=CondR)) +
geom_hline(aes(yintercept=0.5),linetype='solid',color='gray') +
geom_line()+
geom_segment(x=max_cluster_condition$start,xend=max_cluster_condition$stop,y=.41,yend=.41,linetype='solid',color='black') +
geom_smooth(aes(ymin=lower, ymax=upper), stat="identity") +
geom_vline(aes(xintercept=0), linetype="dashed", color="gray") +
theme_bw(base_size=10) +
coord_cartesian(xlim=c(300,1800), ylim= c(.40,1.01),expand=F) +
scale_x_continuous(breaks=seq(from=-900,to=2400,by=300)) +
scale_y_continuous(breaks=seq(from=0,to=1,by=.1)) +
scale_fill_manual(values=c("dodgerblue","coral2")) +
scale_color_manual(values=c("dodgerblue","coral2")) +
labs(x='Time since target onset (in ms)',y='Proportion Looking to Target',title='Condition Randomized') +
theme(legend.justification=c(1,0), plot.title=element_text(hjust=.5),legend.position=c(1,0),legend.background=element_rect(fill= NA, color=NA),legend.title=element_blank(),legend.text = element_text(size = 11))
As before, this model is failing to converge:
temp = d %>%
filter(TimeC==1300 & percentMissingFrames < 50) %>%
shuffConditions(3) %>%
select(Sub.Num,Order,Tr.Num,CondR,TimeC,Accuracy)
temp$CondR <- factor(temp$CondR, c('pre-switch','post-switch'))
contrasts(temp$CondR) = c(-.5,.5)
colnames(attr(temp$CondR,"contrasts")) = ""
m = glmer(Accuracy ~ CondR + (1|Sub.Num), data=temp,family=binomial)
summary(m)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: Accuracy ~ CondR + (1 | Sub.Num)
## Data: temp
##
## AIC BIC logLik deviance df.resid
## 377.4 391.1 -185.7 371.4 694
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2190 0.2305 0.2639 0.3078 0.3948
##
## Random effects:
## Groups Name Variance Std.Dev.
## Sub.Num (Intercept) 0.1695 0.4118
## Number of obs: 697, groups: Sub.Num, 56
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.597760 0.003783 686.6 <0.0000000000000002 ***
## CondR 0.522888 0.003781 138.3 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## CondR 0.000
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## Model failed to converge with max|grad| = 0.0186908 (tol = 0.002, component 1)
This is concerning, because it suggests that our data (i.e., 8 trials per condition per participant) may not be sufficient for logistic mixed effects models to converge. We can confirm this by seeing how often we find large effects in our 1,000 randomizations.
We’re using the same foreach
and %dopar%
syntax to re-reun the updated code to fit the glmer() models 1,000
times. These models fit fairly rapidly, but still take longer than t
tests. This adds up when running 46 tests 1,000 times. It took my
computer just under 8 minutes to complete this process:
# nrep=1000
# pb <- txtProgressBar(max=nrep, style=3)
# progress <- function(n) setTxtProgressBar(pb, n)
# opts <- list(progress=progress)
#
# startTime = format(Sys.time(), "%X")
#
# zs_randCond = foreach (i = 1:nrep, .combine = rbind, .inorder=F, .options.snow = opts, .packages = c("tidyverse","purrr")) %dopar% {
#
# bySub = d %>%
# filter(TimeC>=300 & TimeC <=1800 & percentMissingFrames < 50) %>%
# shuffConditions(i) %>%
# select(Sub.Num,Order,Tr.Num,CondR,TimeC,Accuracy)
#
# bySub$CondR <- factor(bySub$CondR, c('pre-switch','post-switch'))
# contrasts(bySub$CondR) = c(-.5,.5)
# colnames(attr(bySub$CondR,"contrasts")) = ""
#
# zs <- bySub %>%
# group_by(TimeC) %>%
# group_modify(~extract_glmer_z(glmer(Accuracy ~ CondR + (1|Sub.Num), data=.x,family=binomial))) %>%
# mutate(Sig = ifelse(abs(z)>2,"yes","no"))
#
# max_cluster_condition = get_max_cluster_glmer(zs %>% select(-Sig) %>% filter(term=='CondR'))
# max_cluster_condition$term = 'CondR'
# max_cluster = (max_cluster_condition) %>% select(auc,term)
#
# return(max_cluster)
#
# }
#
# stopTime = format(Sys.time(), "%X")
auc | term |
---|---|
-455.45 | CondR |
-433.39 | CondR |
-263.89 | CondR |
-262.08 | CondR |
-247.15 | CondR |
-244.38 | CondR |
-231.99 | CondR |
-220.94 | CondR |
-220.54 | CondR |
-207.47 | CondR |
-206.91 | CondR |
-203.96 | CondR |
-194.98 | CondR |
-192.52 | CondR |
-188.84 | CondR |
-186.28 | CondR |
-179.28 | CondR |
-176.48 | CondR |
-168.79 | CondR |
-164.25 | CondR |
Yeah, there are a lot of very large auc values. So much so, that the significant effect we observed when using t tests is not statistically significant when using the logistic mixed effects models:
p_obs <- mean(abs(zs_randCond$auc) > abs(max_cluster_condition$auc))
t_crit_pos <- quantile(zs_randCond$auc, probs = 0.975)
t_crit_neg <- quantile(zs_randCond$auc, probs = 0.025)
ggplot(mapping = aes(x = zs_randCond$auc)) +
geom_density(fill = "green", alpha = .5) +
geom_vline(xintercept = t_crit_pos) +
geom_vline(xintercept = t_crit_neg) +
geom_vline(xintercept = max_cluster_condition$auc, colour = "blue") +
labs(
title = "Condition Effect, 300-1800 ms",
x = "Largest Cluster (1000 Permutations)"
)
With the raw data, children’s accuracy in fixating the target object was higher for pre-switch compared to post-switch trials during a window from 533ms to 867ms after the onset of the critical word. The size of this effect (i.e., area under the curve) is -32.28.
When reshuffling the data (randomly re-assigning trials for a child to be in the pre- vs. post-switch condition), we observe an effect of this size 58.3% of the time.
What these results suggest is that 8 observations per condition (16 total) for each participant is not sufficient for logistic mixed effects modeling. For many of the time frames, the model fit was able to converge, but the model failed to converge a non-trivial number of times. We can see this mostly clearly in the distribution of auc values.
I have successfully used logistic mixed effects models for cluster-based permutation analyses for another data set. That data set, however, contained 24 trials per condition (48 total). So, if you need to use logistic mixed effects modeling for cluster-based permutation analyses (e.g., you have multiple fixed effects or want to include a cognitive measure as a covariate) then you should plan to have more than 8 trials for each level of your fixed effect.