Repeated Measures Analysis in R

Goals for today

Learn how to analyze a repeated measure/longitudinal design:

Within subject 2x4 factorial design, repeated measures ANOVA
- use the afex (analysis of fixed effects) package in R
  - specify nested terms in model formula
- Effect size: R-squared, generalized eta-squared, partial eta-squared
- post-hoc simple effects and pairwise comparisons with emmeans

Step 1 - Get organized

Download this data file and save it in the “data” folder within the project folder you created for this week:
- “mentalrotation_bysub_tidy.csv”
Now open RStudio and start a new project, select “Existing Directory” and select the folder you set up earlier for this activity as the location
Install the package “ggbeeswarm” in the console:
- install.packages("ggbeeswarm")
In RStudio, start a new R markdown and do your work in there, save the file in a subfolder called r_docs
- put these lines in your “setup” code chunk:
  knitr::opts_chunk$set(echo = TRUE)
  knitr::opts_knit$set(root.dir = rprojroot::find_rstudio_root_file())
  library(tidyverse)
  library(afex)
run the setup code chunk

Step 2 - Import data and check it out

This data is from Ganis and Kievit (2016), a replication of the Shephard and Metzler (1971). The is the same dataset that we used in the data visualization activity.
In this study subjects had to mentally rotate a 3D shape and respond whether it was the same or different compared to a reference shape. The angle of rotation was manipulated (within subjects) at 0, 50, 100, and 150 degrees as well as the desired response (whether the shape was actually “same” or “different”).
Each line in the mentalrotationbysub_tidy.csv file represents 1 subject’s performance on 1 condition of the task. The RT column is the mean response time (milliseconds) across all trials of a given type for a given subject (only trials with correct responses are included).
There are 4 discrete levels of Angle, so we will treat the Angle condition as an ordered categorical variable (ordinal)
This gives use 8 repeated measures for each subject: 4 levels of Angle * 2 levels of DesiredResponse

What to do:

Make a new code chunk and use readr::read_csv() to read in the data. Make sure that NA values are handled the way you want (click on the tibble in the Environment window pane to take a quick look after you import the data). Notice that the data are in a different format (tidy/long) than they were in the file we imported into SPSS (wide format), but it is the same data.
Set DesiredResponse, and Angle as factors, using forcats::as_factor(). Check the order of the levels of each factor using the levels() function - are the levels in the correct order (especially for Angle)?
Make a table of means and confidence intervals by condition.
Plot RT in whatever way you choose - the solution below is a violin chart of the distributions with means and CI overlayed. A boxplot is also useful for seeing extreme values.
Make note of your observations - are there extreme values that you’re concerned about? We have plenty of data points so let’s not worry too much about the slightly non-normal distributions.

#1. first import the data
#funny unintended result: "NA == "na" is interpreted by R as FALSE, and the function accepts it as a value for the first option `col_types`
mrot_tib <- readr::read_csv("data/mentalrotation_bysub_tidy.csv", FALSE) 
#correct
mrot_tib <- readr::read_csv("data/mentalrotation_bysub_tidy.csv", na = "na")

#2. now make sure the columns we want as factors are treated that way, using forcats::as_factor() set DesiredResponse and Angle as factor variable 
mrot_tib <- mrot_tib  |>  dplyr::mutate(
  DesiredResponse = forcats::as_factor(DesiredResponse),
  Angle = forcats::as_factor(Angle)
)
# check order of factor variables - use cat() to make the output more clear
cat("levels of Angle: ",levels(mrot_tib$Angle),"\n")
cat("levels of DesiredResponse: ",levels(mrot_tib$DesiredResponse),"\n")

#3. Make a table of means
#first store the means in a tibble
mrot_summary_tib <- mrot_tib |> 
  dplyr::group_by(Angle,DesiredResponse) |> 
  summarise(
    meanRT = mean(RT),
    RT.ci.low = mean_cl_normal(RT)$ymin,
    RT.ci.upp = mean_cl_normal(RT)$ymax
  ) |> ungroup()
#now display the tibble as a formatted table
mrot_summary_tib |> knitr::kable(caption = "Summary of RT", digits = 2) |> 
  kableExtra::kable_classic(lightable_options = "hover")

#4. Plot the means and CI in a line chart
p1 <- mrot_tib |>
  ggplot(aes(x = Angle, y = RT, fill=DesiredResponse)) + 
    geom_violin(position = position_dodge(.9)) +
    geom_point(data=mrot_summary_tib, 
               aes(x = Angle, y = meanRT, fill = DesiredResponse), 
               stat="identity", position = position_dodge(.9)) +
    geom_errorbar(data=mrot_summary_tib, 
                  aes(x = Angle, y = meanRT, ymin=RT.ci.low, ymax=RT.ci.upp),
                  stat="identity", width=.2, position_dodge(.9)) +
    theme_classic() + labs(title="RT violin plots w/ mean and CI", y = "response time (ms)", x = "angle of disparity (degrees)")
p1
p2 <- mrot_tib |>
  ggplot(aes(x = Angle, y = RT, fill=DesiredResponse)) + 
    geom_boxplot() +
    theme_classic() + 
    labs(title="RT box plots", y = "response time (ms)", x = "Angle (degrees)")
p2

## levels of Angle:  0 50 100 150 
## levels of DesiredResponse:  Different Same

Summary of RT
Angle	DesiredResponse	meanRT	RT.ci.low	RT.ci.upp
0	Different	2342.12	2128.90	2555.35
0	Same	1563.23	1408.18	1718.28
50	Different	2579.23	2336.54	2821.91
50	Same	2460.90	2261.75	2660.06
100	Different	3169.62	2925.45	3413.80
100	Same	3052.73	2817.74	3287.72
150	Different	3277.88	3047.08	3508.67
150	Same	3340.50	3085.95	3595.05

Step 3 - Fit a two-way repeated measures ANOVA model

Rep Measures decision chart

Above is the decision process chart from the Field textbook.

We will conduct a two-way repeated measures ANOVA model where RT is explained by (a) Angle, (b) DesiredResponse, and (c) the interaction of Angle and DesiredResponse. In plain words, we are examining whether the response time for a judgment that a shape is the same or different from a reference shape is influenced by (a) the angle of rotation of the shape from the reference, (b) whether the shape actually is the same or different, and (c) whether the effect of angle of rotation depends on whether the shape actually is the same or different.
Following the chart, we already took a quick look at the data distributions. Now we will fit the model.
Use the afex::aov_4() function. The function requires a model formula and data tibble. The formula is specified similarly to models we used before (RT ~ Angle), but this time we have to specify how the repeated measures are grouped using the | symbol. Our model formula here is RT ~ Angle*DesiredResponse + (Angle*DesiredResponse|Participant) – remember that, in a formula, A*B is shorthand for A + B + A:B (where : indicates an interaction)
The same model can be specified with other functions, including afex::aov_ez() or afex::aov_car() (each requires the model specification in a different format). We are using afex::aov_4() because the model specification format is similar to what we have used before (with lm()).

#1. fit the model
rt_rm <- afex::aov_4(RT ~ Angle*DesiredResponse + (Angle*DesiredResponse|Participant), data = mrot_tib)

#3. output the Anova Table
rt_rm

#4. output additional stats (sphericity correction info)
summary(rt_rm)

#5. output partial eta-squared effect sizes for comparison with SPSS  
effectsize::eta_squared(rt_rm, ci = .95, partial = TRUE, alternative = "two.sided") |>
  kableExtra::kbl(caption = "Effect Size: Partial Eta-squared", digits = 4) |>
  kableExtra::kable_styling(full_width = FALSE)

## Anova Table (Type 3 tests)
## 
## Response: RT
##                  Effect           df       MSE          F  ges p.value
## 1                 Angle 1.94, 102.86 325296.19 193.12 *** .299   <.001
## 2       DesiredResponse        1, 53 309859.43  19.72 *** .021   <.001
## 3 Angle:DesiredResponse 2.68, 141.90 186512.96  22.27 *** .038   <.001
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
## 
## Sphericity correction method: GG 
## 
## Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
## 
##                           Sum Sq num Df  Error SS den Df F value    Pr(>F)    
## (Intercept)           3203813311      1 208901407     53 812.834 < 2.2e-16 ***
## Angle                  121919280      3  33460251    159 193.116 < 2.2e-16 ***
## DesiredResponse          6111038      1  16422550     53  19.722 4.583e-05 ***
## Angle:DesiredResponse   11122070      3  26466815    159  22.272 4.306e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Mauchly Tests for Sphericity
## 
##                       Test statistic  p-value
## Angle                        0.46568 0.000000
## Angle:DesiredResponse        0.83108 0.088421
## 
## 
## Greenhouse-Geisser and Huynh-Feldt Corrections
##  for Departure from Sphericity
## 
##                        GG eps Pr(>F[GG])    
## Angle                 0.64692  < 2.2e-16 ***
## Angle:DesiredResponse 0.89247  4.936e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                          HF eps   Pr(>F[HF])
## Angle                 0.6711268 2.705655e-36
## Angle:DesiredResponse 0.9444454 1.517512e-11

Effect Size: Partial Eta-squared
	Parameter	Eta2_partial	CI	CI_low	CI_high
1	Angle	0.7847	0.95	0.7297	0.8243
3	DesiredResponse	0.2712	0.95	0.0900	0.4479
2	Angle:DesiredResponse	0.2959	0.95	0.1776	0.3961

Understanding the output

Anova Table:
- Effect: one row for each term in the model
- F and p-value: same idea as the F-statistic in between subjects ANOVA, but all variance is within subjects (see Field textbook Chap 15 for more). The basic idea holds that the F-stat is the ratio of variance explained by the model to unexplained variance.
- ges (Generalized Eta-squared) - this is an effect size measure for each term in the model. Like eta-squared and R-squared, it represents the proportion of variance in RT that is explained by Angle (“generalized” to a repeated measures model). But notice that it is very different from the partial eta-squared value that you got in SPSS (see the solution code for how to output the partial eta-squared for this model in R). The difference is analogous to the two ways to compute Cohen’s d that we discussed for a paired t-test. The (larger) partial eta-squared value is specific to the within-subjects design (individual differences are not part of error variance), whereas the generalised eta-squared value is not (individual differences are part of the error variance). Which one you report depends on whether you want the effect size measure to be comparable across different designs. See Lakens (2013) for more discussion.
- Sphericity correction method: “GG” indicates the F (and p) are corrected with the Greenhouse-Geisser method
Mauchly Tests for Sphericity: Gives a statistical test for violation of sphericity (correlations between pairs of levels are not the same) - for each factor with more than two levels. The Field textbook (section 15.5.2) recommends we ignore this because it depends upon sample size (like any sig. test). Instead we look at estimates of sphericity and use the appropriate correction for non-sphericity (GG).
Greenhouse-Geisser and Huynh-Feldt Corrections:
- “GG/HF eps” give the epsilon value for the sphericity estimate. A value of 1 indicates perfect sphericity and lower values indicate non-sphericity.
- “Pr(>F[GG/HF])” gives the corrected p-value based on the Greenhouse-Geisser(GG) or Huynh-Feldt(HF) method. The GG value is the same as the p-value in the Anova Table (as it states “correction method: GG”. See section 15.5.4 of the textbook for details about the correction methods. Following the textbook’s advice we will use the GG correction.

Answer the following questions for yourself:

Is there a significant effect of angle of rotation (on response time)? (define your significance criterion)
Is there a significant effect of whether the shapes are the same or different? (define your significance criterion)
Is the effect of angle of rotation significantly moderated by whether the shapes are same or different?

Step 4 - follow up comparisons to interpret ANOVA results

So the output suggests that the effect of angle of rotation depends on whether the shapes are same or different (or we can equivalently state that the effect of whether a shape is same or different depends on the angle of rotation). How do we interpret that interaction effect?
One approach is to look at simple effects with the emmeans (estimated marginal means) package joint_tests() function. Simple effects analysis in a factorial design looks at the effect of one factor at each level of the other factor. Let’s look at the effect of Angle at each level of DesiredResponse, and then the effect of DesiredResponse at each level of Angle. The syntax for the joint_tests function is like this:

emmeans::joint_tests(rt_rm, "DesiredResponse") #effects at each level of DesiredResponse
emmeans::joint_tests(rt_rm, "Angle") #effects at each level of Angle

Another approach to characterize an interaction is with pairwise comparisons. This can get a little cumbersome when there are 28 possible comparisons, but we can restrict the number of comparisons by only looking at pairs of Angle conditions within each DesiredResponse condition. You can see the solution code for how to do this, but we won’t go over the output because the simple effects approach characterizes the pattern of means well.
You can also plot the means very easily using afex::afex_plot() (see the solution), but you probably already did this above.

#simple effect of Angle at each level of DesiredResponse
emmeans::joint_tests(rt_rm, "DesiredResponse")

#simple effect of DesiredResponse at each level of Angle
emmeans::joint_tests(rt_rm, "Angle")

#for paired comparisons: effect of Angle within DesiredResponse
int_emm <- emmeans::emmeans(rt_rm, ~Angle|DesiredResponse, method = "multivariate")
pairs(int_emm, adjust = "bonferroni")

# plot the means
afex::afex_plot(rt_rm, x = "Angle", trace = "DesiredResponse", error = "within")

## DesiredResponse = Different:
##  model term df1 df2 F.ratio p.value
##  Angle        3  53  43.544  <.0001
## 
## DesiredResponse = Same:
##  model term df1 df2 F.ratio p.value
##  Angle        3  53  81.227  <.0001
## 
## Angle = X0:
##  model term      df1 df2 F.ratio p.value
##  DesiredResponse   1  53  67.391  <.0001
## 
## Angle = X50:
##  model term      df1 df2 F.ratio p.value
##  DesiredResponse   1  53   2.084  0.1547
## 
## Angle = X100:
##  model term      df1 df2 F.ratio p.value
##  DesiredResponse   1  53   2.540  0.1170
## 
## Angle = X150:
##  model term      df1 df2 F.ratio p.value
##  DesiredResponse   1  53   0.442  0.5090
## 
## DesiredResponse = Different:
##  contrast    estimate    SE df t.ratio p.value
##  X0 - X50        -237  62.0 53  -3.824  0.0021
##  X0 - X100       -828  79.2 53 -10.450  <.0001
##  X0 - X150       -936  91.1 53 -10.268  <.0001
##  X50 - X100      -590  77.3 53  -7.638  <.0001
##  X50 - X150      -699  82.9 53  -8.424  <.0001
##  X100 - X150     -108  68.2 53  -1.588  0.7100
## 
## DesiredResponse = Same:
##  contrast    estimate    SE df t.ratio p.value
##  X0 - X50        -898  82.0 53 -10.950  <.0001
##  X0 - X100      -1490 108.2 53 -13.765  <.0001
##  X0 - X150      -1777 117.9 53 -15.070  <.0001
##  X50 - X100      -592  63.2 53  -9.359  <.0001
##  X50 - X150      -880  77.7 53 -11.320  <.0001
##  X100 - X150     -288  73.9 53  -3.895  0.0017
## 
## P value adjustment: bonferroni method for 6 tests

Understanding the output

The simple effect of Angle at each level of DesiredResponse is significant
The simple effect of DesiredResponse is significant at Angle=0, but not at other levels of Angle
This pattern is apparent in the plot of means - people are faster to respond to the same image at Angle=0 than a different image at Angle=0, but same/different doesn’t matter at Angles greater than 0.
We might write up this result like this (paraphrased from Ganis & Kievit (2015)): A two-way repeated measures ANOVA with angle of rotation (0, 50, 100, and 150 degrees) and trial type (same, different) as factors showed that RTs were influenced by angle of rotation (F(1.94,102.86) = 193.12, p < .0001, Greenhouse-Geisser correction for non-sphericity, partial η² = .785). Trial type influenced RTs (F(1,53)=19.72, p<.0001, partial η² = .271, and this effect varied by angle of rotation (F(2.68,141.90)=22.27, p < .0001, partial η² = .30). Simple effects analysis confirmed that angle of rotation increased RTs in both trial type conditions (same: F(3,53) = 81.23, p < .0001; different: F(3,53) = 43.54, p < .001), but the effect of trial type was significant only at the lowest angle of rotation ( angle 0: F(1,53) = 67.391, p <.0001; angle 50: F(1,53) = 2.08, p = 0.154; angle 100: F(1,53) = 2.54, p = 0.117; angle 150: F(1,53) = 0.44, p = 0.509).
Note: Why use partial eta-squared rather than generalized eta-squared? You could make an argument either way - but mental rotation is generally a phenomenon examined within subjects, so partial eta-squared makes sense for comparison across the studies that examine this phenomenon.

What if I’m still worried about assumption violations, like if there are bad outliers?

For a walkthrough of assumption checks for ANOVA models in afex, see this page written by the creator of the “afex” package
There is no simple tool for a non-parametric or robust alternative to factorial repeated measures ANOVA, so what can you do? Consider:
- can you characterize the data better by reporting results with and without outliers?
- can you use one of the non-parametric methods we’ve covered earlier to model parts of the experiment? (e.g., paired comparisons with Wilcoxon signed rank tests, or one-way robust ANOVA with WRS2::rmanova())

Extra - using planned contrasts

see the code below for an example of how to generate linear, quadratic, and cubic contrasts for the Angle variable (like in the “Tests of Within Subjects Contrasts” table in SPSS)
for further explanation see this example at https://nmmichalak.github.io/nicholas_michalak/blog_entries/2018/nrg07/nrg07.html

rt_rm |> emmeans::lsmeans(specs= ~ Angle) |> emmeans::contrast(method = "poly")

##  contrast  estimate    SE df t.ratio p.value
##  linear        4661 276.5 53  16.858  <.0001
##  quadratic     -369  65.6 53  -5.628  <.0001
##  cubic         -417 137.6 53  -3.028  0.0038
## 
## Results are averaged over the levels of: DesiredResponse

That’s all for this activity!

References

Chapter 15 of Field textbook: Field, A.P. (2018). Discovering Statistics Using IBM SPSS Statistics. 5th Edition. London: Sage.
Ganis, G., & Kievit, R. (2015). A new set of three-dimensional shapes for investigating mental rotation processes: validation data and stimulus set. Journal of Open Psychology Data.
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in psychology, 4, 863.
Testing Assumptions for models fit with the afex package

Repeated Measures Analysis in R

Lobue & Bhanji - Statistical Methods

Goals for today

Step 1 - Get organized

Step 2 - Import data and check it out

What to do:

Step 3 - Fit a two-way repeated measures ANOVA model

Above is the decision process chart from the Field textbook.

Understanding the output

Answer the following questions for yourself:

Step 4 - follow up comparisons to interpret ANOVA results

Understanding the output

What if I’m still worried about assumption violations, like if there are bad outliers?

Extra - using planned contrasts

That’s all for this activity!

References