Activity #4 - Data visualization in SPSS

updated Sep 29 2022

Learning Objectives

Learn basic chart types using SPSS that are appropriate for common purposes
- condition means (bar plot, line plot)
- condition means with a grouping variable (barplot, line plot)
- relation between 2 variables (scatter)
- relation between 2 variables with a grouping variable (grouped scatter)
- alternative methods to better visualize variability
start to develop an understanding of the strengths and weaknesses of different approaches to visualize data

Tips for using Virtual Computing

It is convenient to use Rutgers Box to store your files on the cloud. If you have activated the Box service for your netID (use the "Service Activation" link on , you will see a "Box" folder on your Virtual Computing Desktop and it will appear in Windows Explorer as a folder where you can save files. This way the work you save will still be in your Box folder when you close your connection to the Virtual Computing Desktop.
If you have problems with windows or menus extending off the screen, try
- auto-hiding the taskbar (right-click on the Virtual Desktop taskbar, "taskbar settings", "Automatically hide the taskbar in desktop mode")
- increasing resolution on your home display
From the virtual desktop, open a web browser and log in to Canvas so that you can go to the lab activity guide links from within the Virtual Desktop (this way you can download the files you need within your Virtual Desktop session)

Step 1 - Get organized and download the data file for this activity

1.1 Set up a typical project work flow consisting of a project folder ("data-visual") containing

"data" folder to store raw data files for this activity
- Download the data file mentalrotation.csv for today and put it in the "data" folder
"spss" folder to store all spss files for this activity
"r_docs" folder to store your lab notes in R markdown
when you get to the RStudio activity, you can create an Rproj project file for this folder

Step 2 - import the data into SPSS

Start SPSS and import the mentalrotation.csv data file:

File -> Import Data-> Text Data
Go through the dialog boxes to import. The first line contains variable names and the delimiter is "Comma"
Pay attention to the variable type and measure type of each column. What happens to the columns that have "NA" values? Make sure that the "Trial", "Time", and "Age" columns are variable type "Numeric" and measure type "scale". "Angle" should be variable type "Numeric" but measure type "Ordinal"
Open a file to save your notes on this activity (use whatever text editor/word processor you prefer, or take notes directly into an SPSS syntax file: File->New->Syntax)

Description of the data file

This data is from Ganis and Kievit (2016), a replication of the Shephard and Metzler (1971).
In this study subjects had to mentally rotate a 3D shape and respond whether it was the same or different compared to a reference shape. The angle of rotation was manipulated (within subjects) at 0, 50, 100, and 150 degrees as well as the desired response (whether the shape was actually same or different). The image below shows an example in the 150 degree condition (top is "same", bottom is "different")
Each line in the MentalRotationBehavioralData.csv file represents 1 trial. The Time column is response time in milliseconds. Missed responses are coded as NA in the Time column and "[N/A]" in the ActualResponse column.

image from Ganis and Kievit (2016)

Step 3 - plot means for each Angle condition

Note: This procedure is just one approach to create the graph in SPSS. There may be other, better ways (for this and other graphs) and you are welcome to use your preferred approach.

The first plot we will make is a simple bar chart of mean response times, with one bar for each "Angle" level.
To do this, first we will make a new dataset where we aggregate across trials, so that for each participant we have an average response time for each of the 4 conditions (4 rows per participant in the new datafile)
- Use the Data->Aggregate feature to create the new dataset
- Enter Participant & Angle as the "break" variables
- Enter Time in the "Summaries of Variable(s)" box
- "MEAN" is used as the aggregating function by default
- Select "Create a new dataset containing only the aggregated variables" and enter a descriptive name like "mental-rotation-agg-bysubj"
- check the box to sort the file before aggregating
- When you click "OK" you will get a new datafile, you can save it with the same filename you used to name the dataset - make sure you keep this dataset as the active dataset while doing the next steps
- go to the Variable View, and enter "response time" as the label for Time_mean
Now, with the aggregated dataset active, go to Graphs->Chart Builder
- First select the "simple bar" chart by dragging it from the bottom "Gallery" tab to the top Preview area (the area that says "drag a Gallery chart here")
- Now drag the Angle variable to the x-axis of the bar chart in the preview window
- Now drag the Time_mean variable to the y-axis of the bar chart in the preview window
- Click OK
Does the bar chart look how you expected? What does it tell you about the pattern of results?

Error bars

Okay, so we can see the means, but there's no information at all about variability, so let's at least put error bars on the bar plot.
Go back to the Chart Builder (Graphs->Chart Builder) and this time check the box for "Display error bars", and specify that you want 95% confidence intervals (now click okay)
What do the error bars tell you in this new chart?

Show the means in a line plot

In some cases you might prefer a point and line plot instead of bars
Go back to the Chart Builder (Graphs->Chart Builder) and this time find the "Simple Line" plot in the Gallery and drag it to the preview window.
- drag the Angle and Time_mean variables to the x and y axes as you did before, and select the same error bars
Does this plot show you anything different than the bar plot?

Step 4 - group the bar plot by "DesiredResponse"

The column "DesiredResponse" (in the unaggregated file) indicates whether the target shape was the same or different than the reference shape. Let's treat the data like a 2x4 design and split each bar into two (one bar for "same" and one for "different").
to accomplish this, first we have to re-do the aggregation step to also group by "DesiredResponse" values
re-activate the first (unaggregated) dataset by clicking on it or selecting it through the "window" menu of SPSS
Now, do the same aggregation step as before (Data->Aggregate), but this time include Participant, Angle, and DesiredResponse as the "break" variables
- Select "Create a new dataset containing only the aggregated variables" and enter a descriptive name like "mental-rotation-agg-bysubj-and-desresp"
- When you click "OK" you will get a new datafile, you can save it with the same filename you used to name the dataset - make sure you keep this dataset as the active dataset while doing the next steps
- go to the Variable View, and enter "response time" as the label for Time_mean
Now, with the newly aggregated dataset active, go to Graphs->Chart Builder
- First select the "Clustered bar" chart type by dragging it from the bottom "Gallery" tab to the top Preview area
- Now drag the Angle variable to the x-axis of the bar chart in the preview window
- Now drag the Time_mean variable to the y-axis of the bar chart in the preview window
- Now drag the DesiredResponse variable to the "Cluster on X: set color" box in the preview window
- set the error bars for 95% confidence intervals
- Click OK
What does this new chart tell you about the data?

Now see if you can chart the same information using a line plot instead of a bar plot

make sure the newly aggregated dataset is the active one
go back to the chart builder and change the chart type
- use the "Multiple Lines" chart type and specify the x, y, and cluster variables as before

Step 5 - visualize a relationship between 2 variables

Is reaction time related to accuracy?
Let's compute accuracy for each subject, by counting the proportion of "Correct" values in the "CorrectResponse" column. We exclude trials with no response. Alternatively, you might have reasons to include these trials as incorrect for the accuracy calculation, but for this activity we will exclude them.
Go back to the original unaggregated dataset (where each row is a single trial)
First, select only cases that have a valid response time (Data->Select Cases, "If condition is satisfied", specify "Time > 0" for the condition
Now aggregate the data, similarly to how you did the first time, but in addition to "Time" in the "Summaries of Variables" box, add "CorrectResponse"
- because "CorrectResponse" is a nominal variable, you can't use MEAN as the aggregating function - instead you will use the PIN function (stands for "Percent IN")
- to specify the aggregating function, first highlight the "CorrectResponse..." entry in the "Summaries of Variables" box, then click the "Function" button
- now select "Percentages", "Inside", and type "Correct" (without the quotes, but match the upper/lowercase letters exactly) into both the "Low" and "High" text boxes
- Select "Create a new dataset containing only the aggregated variables" and enter a descriptive name like "mental-rotation-agg-bysubjwithaccuracy"
- check the box to sort the file before aggregating
- When you click "OK" you will get a new datafile, you can save it with the same filename you used to name the dataset - make sure you keep this dataset as the active dataset while doing the next steps
Look at the newly aggregated dataset (use the "Window" menu if you don't see it on your screen) - there should be 1 row per subject, and 3 columns (Participant, Time_mean, and CorrectResponse_pin)
go to the Variable View, and enter "response time" as the label for Time_mean, and "accuracy" as the label for CorrectResponse_pin

Now make a scatter plot of accuracy against response time

make sure the newly aggregated dataset is the active one
Graphs->Chart Builder
- drag the Scatter Plot chart type into the preview window
- put accuracy on the x-axis and response time on the y-axis
- include a linear trend line by checking the "total" box under "Linear Fit Lines"
Take a look at the scatter plot - is there any relation between the two variables?

Scatter plot of response time by accuracy, grouped by DesiredResponse

What if we wanted to split up subject RT and accuracy by DesiredResponse (whether the shapes were same or different)?
We need to re-aggregate the data by going back to the un-aggregated dataset, and this time specifying DesiredResponse as an additional "break" variable (everything else as you had it last time)
try it now (make sure you have excluded "no response" trials) - you should now get a new dataset with 2 rows for each subject
set labels for the variables in the newly aggregated dataset
Graphs->Chart Builder, use the same Scatter Plot chart type
put accuracy on the x-axis and response time on the y-axis
drag DesiredResponse to the "set Color" box in the preview window
click on "Subgroups" below "Linear Fit Lines" to get trendlines for the two conditions
Does this new chart show you anything new about the relation between accuracy and response time?

psych596

Graduate Statistical Methods for Psychology