SPSS Lab activity - Moderation and Mediation Models

Jamil Palacios Bhanji and Vanessa Lobue
Last edited Oct 19, 2022

Goals for today


Step 0 - Get organized, import data


Step 0.1 - Start SPSS and import the data

If you have installed the PROCESS macro for SPSS, check that you see a menu item under Analyze->Regression called "Process v4.1 by Andrew Hayes" - if you installed it before and don't see it now, you may have to run SPSS as an administrator (right-click on SPSS icon and select "run as administrator") -- if you don't have admin access on your computer please join forces with a partner who has the process macro installed already. For instructions on installing PROCESS, see this video, first 4 minutes, and here is the link to download PROCESS https://www.processmacro.org/download.html

Data description: lumos_subset1000plusimaginary.csv is the same file we have been working with before. Today we will make use of a fabricated variable called imaginary_screensize which gives the size of the screen on which users complete the tests - this is a simulated variable that is not part of the real dataset.

This is subset of a public dataset of Lumosity (a cognitive training website) user performance data. You can find the publication associated with the full dataset here:
Guerra-Carrillo, B., Katovich, K., & Bunge, S. A. (2017). Does higher education hone cognitive functioning and learning efficacy? Findings from a large and diverse sample. PloS one, 12(8), e0182276. https://doi.org/10.1371/journal.pone.0182276

Import the data: Open SPSS and use File -> Import Data-> CSV or Text Data - now check the variable types and add labels if you wish. Careful! If you use "import text data" make sure you set the only delimiter as "comma" (SPSS may automatically also treat "space" as a delimiter, so uncheck that option)

What to do next:


Step 1 - Moderation - when the relation between two variables depends on the level of another variable

Moderation{width=50%}

Above: the conceptual model of moderation

Moderation{width=50%}

Above: the statistical model of moderation - notice that X1 and X2 are interchangeable in the statistical model (i.e., the statistical model does not distinguish between the predictor and the moderator)

Previously, when we looked just at age and raw_score, we saw a small association such that older participants scored lower. But what if that association depended on another variable, such as the size of their screen? That is, maybe older individuals do worse than younger individuals only if they are working on a small screen? Such a relationship would be an example of an interactive effect, or moderation (i.e., screen size moderates the effect of age on raw score, or it might equally be stated that age moderates the effect of screen size on raw score).
So far, the data we have worked with are real values from Lumosity, but for this step there is a simulated imaginary variable called imaginary_screensize - just for educational purposes. In the moderation model we will test, age is X1 (predictor), imaginary_screensize is X2 (moderator), and raw_score is Y (outcome).

Step 1.2 - model an interaction between 2 continuous variables

As depicted in the statistical model of moderation graphic above, we test a moderation with a regression model where the outcome (Y) is explained by a predictor (X1), a moderator (X2), and their product (X1*X2, called the interaction of X1 and X2). To start out will use the same linear regression model that we were using last week.
Try it now:
1. First, create a new variable called ageXscreensize by using the Transform->Compute Variable menu (the Numeric Expression should be "age*imaginary_screensize")
2. Now go to Analyze->Regression->Linear and specify

Examine the model output

  1. Scan through the Model Summary (we can reject the null hypothesis that all coefficients are zero) and then focus on the Coefficients Table. Notice that there are coefficient estimates for age, imaginary_screensize, and their interaction ageXscreensize.

  2. The positive coefficient (with low p-value) for the interaction term suggests that at larger screensize values, the relation of age to performance is more positive (less negative to be precise) than at smaller screensize values. Equally, we could restate it: at older age values, the relation of screensize to performance is more positive/less negative. The coefficient for the interaction term has the same meaning as the other coefficients: that an increase in one unit of the predictor predicts an increase in .002 units of the outcome -- of course the predictor in this case is the product of two variables, so interpretation takes a little more work, which we will do in step 1.4.

  3. But we now have issues with the interpretation of the coefficients for the main effects of age and imaginary_screensize. The coefficient for a single variable in a model represents the effect of that variable when other terms are zero. So the coefficient for imaginary_screensize (the main effect of screensize) now represents the effect of screensize at age=0. Likewise, the coefficient for age represents the effect of age when screensize=0. Neither effect is interpretable, because of the presence of the interaction.

Step 1.3 - Mean-center the variables

*Create new variables storing the means of original variables.
aggregate outfile * mode addvariables
/mean_age = mean(age)
/mean_screensize = mean(imaginary_screensize).
*Subtract mean from original values.
compute age_cent = age - mean_age.
compute screensize_cent = imaginary_screensize - mean_screensize.
compute age_centXscreensize_cent = age_cent * screensize_cent.

Now look at the output and notice:

Step 1.4 - Calculate simple slopes to interpret the interaction

Understand the output and interpret the interaction

Now that you've looked at the output, answer the following questions about the model in your notes:

1) What is the relation between age and performance (raw_score) when screen size is held at a low value?
2) What is the relation between age and performance when screen size is held at its average (mean) value?
3) What is the relation between age and performance when screen size is held at a large value?
4) Can you translate those "significance regions" cutoffs into the original (not mean-centered) imaginary_screensize units?


Step 2: Mediation model

Mediation chart

Above: the simple three variable model of mediation - notice that X and M are distinguished in the model (unlike X1 and X2 in the statistical moderation model)

What is Mediation?

Mediation refers to a situation when the relationship between a predictor variable (X in the chart above) and an outcome variable (Y in the chart above) can be explained by their relationship to a third variable (the mediator, M).
Forget about the screen size variable for a moment and consider possible explanations for the negative relation between age and performance that we saw when age was our only predictor for raw_score.
Maybe part of the relation could be explained by something like eyesight that deteriorates with age. The data file you imported for this activity has a new (simulated/imaginary) variable called eyesight_z which is an eyesight "score" where higher values indicate better eyesight (it has been scaled such that the sample mean is 0 and the s.d. is 1).

We will test a mediation model where eyesight_z explains the relation between age and raw_score.

A note about causality: In this model we have good reasons for thinking the direction of the relationships is as specified in the mediation model (i.e., it would not be possible for a change in eyesight to cause a change in age, or for a change in test performance to cause a change in eyesight) but there may be many unmeasured variables that could be involved. The test of our mediation model will tell us whether the eyesight_z measure accounts for a "significant" part of the relationship between age and raw_score.
There are the four conditions for our mediation model test, which are tested with three regression models. First we will list the regression models (coefficients of each model are different so we'll refer to them each with unique subscripts b1 through b4):

  1. Y = intercept + b1X
  2. M = intercept + b2X
  3. Y = intercept + b3X + b4M

We use these three models to check four conditions of mediation (section 11.4.2 of the Field textbook):
1. the predictor variable must significantly predict the outcome variable in model 1 (c is significantly different from 0)
2. the predictor variable must significantly predict the mediator in model 2 (a is significantly different from 0)
3. the mediator must significantly predict the outcome variable in model 3 (b is significantly different from 0)
4. the predictor variable must predict the outcome variable less strongly in model 3 than in model 1 (c' is closer to 0 than c, in other words, the direct effect is smaller than the total effect)

Note that we are using a series of linear regression models to test the mediation model, so the assumptions that we need to check are the same ones we discussed in the multiple regression lab activity, and the same as the ones we should have checked in the moderation examlpe above (but also notice that we will be using bootstrapping to estimate the mediated effect, which eases concern over significance tests on the parameter estimate when assumptions are violated). We won't check assumptions here (to save a little time) but it is a good exercise check the plots of residuals from each model if you have extra time.

Step 2.1 - What are the models that we will use to test the four conditions of mediation?

X corresponds to age, Y to raw_score, and M to eyesight_z, so the three models we use to test the conditions are:
1. raw_score = intercept + b1age [b1 = path c]
2. eyesight_z = intercept + b2age [b2 = path a]
3. raw_score = intercept + b3age + b4eyesight_z [b3 = path c' and b4 = path b]

Step 2.2 - Use the PROCESS macros to estimate the three models

Go to Analyse->Regression-> "Process v4.0 by Andrew Hayes": - specify raw_score as the Y variable
- specify age as the X
- specify eyesight_z as the M
- make sure to remove imaginary_screensize as the Moderator W if it is still in their from before
- select Model number 4 (this refers to Hayes' label for a mediation model, unrelated to our description of the 3 regression models that are used to test a mediation)
- Under Options, select "Show total effect model ..."
- Under Options, select "Standardized effects ..."
- Under Long variable names, select "accept the risk", but be warned that the first 8 characters of any variable names in the dataset must be unique
- Click OK to estimate the models (this will run the three regression models that we discussed above as required for evidence of mediation)

Step 2.3 - Understand the output of PROCESS

Let's look closely at the output from top to bottom:

Answer the following questions in your notes (these are the pieces of information you would report in a manuscript):

  1. What is the "total effect" (path c) of age on performance? (coefficient, standard error of the coefficient, confidence interval around the coefficient, R2, F, p)
  2. What is the relation of age to eyesight (path a)? (coefficient, standard error of the coefficient, confidence interval around the coefficient)
  3. What is the relation of eyesight to performance, controlling for age (path b)? (coefficient, standard error of the coefficient, confidence interval around the coefficient)
  4. What is the "direct effect" of age on performance, controlling for eyesight (path c')? (coefficient, standard error of the coefficient, confidence interval around the coefficient)
  5. What is the indirect effect of age on performance, through eyesight (path a*b)? (coefficient, standard error of the coefficient, confidence interval around the coefficient)

A note on effect size measures for the indirect effect

The "Completely standardized indirect effect(s) of X on Y" section produced by PROCESS is essentially a standardized regression coefficient, and as such it can be compared across studies (and is useful for meta-analyses). We could try to also compute something similar to R2, but approaches to do so cause difficulties with how we interpret them, so we recommend the standardized indirect effect measure (see the Field textbook section 11.4.3 for full discussion).

That's all for this part, have some fun in RStudio now!


References