Data Visualization in R - basic examples

Goals for today
Step 1 - Get organized
Step 2 - Import data and check it out
Step 3 - make a bar plot of response time means by Angle condition
Step 4 - group the bar plot by “DesiredResponse”
Step 5 - Visualize a relation between two variables
Step 6. Alternatives to visualise variability
Last Step - Saving your plots as image files
References

updated Sept 2022

Goals for today

Learn basic chart types using ggplot2 that are appropriate for common purposes
- condition means (bar plot, line plot)
- condition means with a grouping variable (barplot, line plot)
- relation between 2 variables
- relation between 2 variables with a grouping variable
- alternative methods to visualize variability
Learn strategies for visualizing variability
By going through examples, develop an beginner’s understanding of the “grammar of graphics” (Wickham, 2007) used by ggplot2 (geom, coordinate system, aesthetic component mapping, statistical function, layers, …)

Starting off notes

Today we are going to jump straight into plotting with the “ggplot2” package. The ggplot2 package will most likely not seem intuitive at first, but doing examples is a good way to get the hang of it. Andy Field’s “discovr” tutorial #3 is highly recommended for getting more comfortable with plotting in R.
For learning more about ggplot2 you can go through the official documentation (the cheat sheet is a good resource) and examples in the R-Graph-Gallery site. Other good resources are the data visualization chapter in R for Data Science, and Nordmann et al (preprint). “Data visualisation using R, for researchers who don’t use R” https://doi.org/10.31234/osf.io/4huvw

This activity borrows the from the PsyTeachR unit on data visualization developed at the University of Glasgow School of Psychology and Institute of Neuroscience and Psychology.

Step 1 - Get organized

If you haven’t already done so (i.e., you already made the folders for the SPSS activity):
1. make a folder for today’s activity- “data-visual”
- in this “data-visual” folder, create a new Rproj file (File->New Project->Existing)
2. in the “data-visual” folder, make a “data” folder
- Download the data file mentalrotation.csv (right-click, Save As) and place it in the new “data” folder
3. in this “data-visual” folder, make a “r_docs” folder
- start a new R Markdown doc and save it in your r_docs folder
- delete all the example text below the “setup” code chunk (that RStudio automatically creates when you make a new RMarkdown doc)
4. in this “data-visual” folder, make an “images” folder (leave it empty for now)
5. In the R markdown doc you started, edit the “setup” code chunk to include these lines:
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_knit$set(root.dir = rprojroot::find_rstudio_root_file())
library("tidyverse")

Step 2 - Import data and check it out

This data is from Ganis and Kievit (2016), a replication of Shephard and Metzler (1971).
In this study subjects had to mentally rotate a 3D shape and respond whether it was the same or different compared to a reference shape. The angle of rotation was manipulated (within subjects) at 0, 50, 100, and 150 degrees as well as the desired response (whether the shape was actually same or different).
Each line in the MentalRotationBehavioralData.csv file represents 1 trial. The Time column is response time in milliseconds. Missed responses are coded as NA in the Time column and “[N/A]” in the ActualResponse column.
What to do first: Make a new code chunk and use readr::read_csv() to read in the data. Make sure that NA values are handled the way you want (click on the tibble in the Environment window pane to take a quick look after you import the data).
What to do next: Let’s tell R to treat DesiredResponse, ActualResponse, CorrectResponse, and Sex as factors, using forcats::as_factor() (by default they are set to type character. Although all our code today will still work the same with these variables as character, using the factor datatype is an easy way to check the levels for each of your variables, and can help you avoid some unwanted errors (e.g., it’s easy to see some hard to find errors, like the values “Correct” and “correct” getting treated as different levels - you would be able to see this easily for any factor column by looking in the “Environment” tab in the top right pane). take a look at the solution to see how to use forcats::as_factor()

#first import the data
mrot_tib <- readr::read_csv("data/mentalrotation.csv", na = c("NA","[N/A]"))

# now make sure the columns we want as factors are treated that way, using forcats::as_factor() - we could let "Angle" be a Ratio variable 
mrot_tib <- mrot_tib %>% dplyr::mutate(
  DesiredResponse = forcats::as_factor(DesiredResponse),
  ActualResponse = forcats::as_factor(ActualResponse),
  CorrectResponse = forcats::as_factor(CorrectResponse),
  Sex = forcats::as_factor(Sex),
)

Now skip some things you would normally do: If you want to take a look at descriptives and distribution of response time, click the button:

#first import the data
mrot_tib <- readr::read_csv("data/mentalrotation.csv", na = c("NA","[N/A]"))

# now make sure the columns we want as nominal vals are treated as nominal, using forcats::as_factor()
mrot_tib <- mrot_tib %>% dplyr::mutate(
  DesiredResponse = forcats::as_factor(DesiredResponse),
  ActualResponse = forcats::as_factor(ActualResponse),
  CorrectResponse = forcats::as_factor(CorrectResponse),
  Sex = forcats::as_factor(Sex),
)

# first group by Participant and store in a new tibble
mrot_bysub <- mrot_tib %>% drop_na(Time) %>% 
  group_by(Participant) %>% 
  dplyr::summarise(
    sub_meanRT = mean(Time)
  ) %>% ungroup()
# then average across Participants and show table
mrot_bysub %>% dplyr::summarise(
  meanRT = mean(sub_meanRT),
  ci.low = ggplot2::mean_cl_normal(sub_meanRT)$ymin,
  ci.upp = ggplot2::mean_cl_normal(sub_meanRT)$ymax,
  median =  median(sub_meanRT),
  sd = sd(sub_meanRT),
  cases = n() - sum(is.na(sub_meanRT))
) %>% ungroup() %>% 
    knitr::kable(caption = "Response Time Descriptives", 
                 digits = 3) %>% 
    kableExtra::kable_styling(full_width = FALSE)

# now visualize the distribution
p2a <- mrot_bysub %>% drop_na(sub_meanRT) %>%
    ggplot( aes(x=sub_meanRT)) + geom_histogram(binwidth=200) + theme_classic() +
        labs (title = "Response Time distribution")
p2b <- mrot_bysub %>% drop_na(sub_meanRT) %>%
    ggplot( aes(y=sub_meanRT)) + geom_boxplot() + theme_classic() + 
        labs (title = "Response Time box plot")
p2c <- mrot_bysub %>% drop_na(sub_meanRT) %>%
    ggplot( aes(sample=sub_meanRT)) + geom_qq() + geom_qq_line() + theme_classic() +
        labs (title = "Response Time Q-Q")
p2a; p2b; p2c
mrot_bysub %>% {shapiro.test(.$sub_meanRT)}

Response Time Descriptives
meanRT	ci.low	ci.upp	median	sd	cases
2734.102	2544.947	2923.256	2618.95	693.007	54

## 
##  Shapiro-Wilk normality test
## 
## data:  .$sub_meanRT
## W = 0.97293, p-value = 0.2588

Step 3 - make a bar plot of response time means by Angle condition

The x-axis is the “Angle” condition (1,50,100,150), and the y-axis is the response time mean. For this exercise let’s first calculate means across trials for each subject, then calculate a sample mean across subjects, with sd based on the subject (by condition) means.
We will only use response times from correct response trials (CorrectResponse==“Correct”)
For most of these plots we recommend you reveal the solution immediately, then paste it into a code chunk in your own Rmd and check that it works (e.g., make sure the variable name matches for where you imported the data). Then you can look through the code and think about what each piece is for.

# A. first group data by Participant and Angle and store in a new tibble
mrot_bysub <- mrot_tib %>% filter(CorrectResponse=="Correct") %>% 
  group_by(Participant, Angle) %>% 
  dplyr::summarise(
    sub_meanRT = mean(Time)
  ) %>% ungroup()
# # ----uncomment this block to inspect Participant-level means
# mrot_bysub %>% head(8) %>% 
#   knitr::kable(caption = "Participant-level Mean RT (correct trials)", digits = 3) %>% kableExtra::kable_styling(full_width = FALSE)
# # -----------

# B. then group that data by Angle and average across Participants
mrot_summary <- mrot_bysub %>% 
  group_by(Angle) %>% 
  dplyr::summarise(
    meanRT = mean(sub_meanRT),
    ci.low = ggplot2::mean_cl_normal(sub_meanRT)$ymin,
    ci.upp = ggplot2::mean_cl_normal(sub_meanRT)$ymax,
  ) %>% ungroup()

# and show the group-level data in a table
mrot_summary %>% 
  knitr::kable(caption = "Group-level Mean RT", digits = 3) %>% 
  kableExtra::kable_styling(full_width = FALSE)

# C. now we can make the plot
p3a <- mrot_summary %>% 
  ggplot(aes(x = Angle, y = meanRT)) + 
    geom_bar(stat = "identity") +
    coord_cartesian(ylim = c(0, 4500)) + 
    theme_classic() + labs(title="Correct trial response time means", y = "mean response time (ms)", x = "angle of disparity (degrees)")
p3a

Group-level Mean RT
Angle	meanRT	ci.low	ci.upp
0	1931.013	1777.462	2084.565
50	2518.084	2312.799	2723.369
100	3105.220	2877.041	3333.400
150	3293.210	3075.000	3511.421

A few key elements in the code:

First we compute participant means for each Angle condition (part A). The filter(CorrectResponse=="Correct") makes sure we only use trials where the response was correct. Then we take the mean across subjects to get group level means for each Angle condition (part B). In general, it is a good idea to use %>% ungroup() to remove hidden grouping information that R stores when you use group_by() and summarise(), or you may get unexpected results if you use the data object in other functions.
the plotting command is all in the block of code (part C) containing the ggplot() function - mrot_summary %>% sends the summary stats to the ggplot() functions.
ggplot(aes(x = Angle, y = meanRT)) sets the “Angle” column of the data to the x-axis and the “meanRT” column to the y-axis. These are called “aesthetics” (hence the aes() function) in ggplot terminology.
After the main ggplot() function we add “layers”. The main layer is the bars that are created by geom_bar(). We specify stat = "identity" because the statistic we want to plot is already calculated in the mrot_summary tibble the we sent as input to ggplot(). If instead we sent a tibble of participant-wise means, we could specify stat = "summary", fun = "mean" to have ggplot compute the group means.
coord_cartesian(ylim = c(0, 4500)) controls the range of the y-axis (we could add xlim = if we wanted to). labs() controls the labels. theme_classic is one of many options for themes that specify color, font, positioning, etc.

Error bars

Okay, so we can see the means, but there’s no information at all about variability, so let’s at least put error bars on the bar plot.
Let’s set them to equal to the 95% confidence interval around the mean. The solution above already calculated ci.low and ci.upp, so just add a layer to the ggplot() code including this geom_errorbar(aes(ymin=ci.low, ymax=ci.upp), width=10) - the ymin/ymax are values for the bars, and the width is in units of the x-axis (which is 50 units between levels because “Angle” is treated as an interval measure rather than as a factor).

# add ymin and ymax to the aesthetics list, which are needed for geom_errorbar
p3b <- mrot_summary %>% 
  ggplot(aes(x = Angle, y = meanRT, ymin=ci.low, ymax=ci.upp)) + 
    geom_bar(stat = "identity") +
    geom_errorbar(width=10) +
    coord_cartesian(ylim = c(0, 4500)) + 
    theme_classic() + labs(title="Correct trial response time means w/ 95% CI", y = "mean response time (ms)", x = "angle of disparity (degrees)")
p3b

Step 3.1 - Line Plot

In some cases you might prefer a point and line plot instead of bars, though it provides the same amount of information. Let’s see how this data looks as a line plot instead.
Instead of geom_bar use geom_point to get a point at each mean.
Now add another layer with geom_line(aes(group=1),stat = "identity") to draw lines connecting the points (the group=1 part specifies there is only one group of lines).

p3c <- mrot_summary %>% 
  ggplot(aes(x = Angle, y = meanRT, ymin=ci.low, ymax=ci.upp)) + 
    geom_point(stat = "identity") +
    geom_line(aes(group=1),stat = "identity") +
    geom_errorbar(width=10) +
    coord_cartesian(ylim = c(0, 4500)) + 
    theme_classic() + labs(title="Correct trial response time means w/ 95% CI", y = "mean response time (ms)", x = "angle of disparity (degrees)")
p3c

Step 4 - group the bar plot by “DesiredResponse”

The column “DesiredResponse” indicates whether the target shape was the same or different than the reference shape. Let’s treat the data like a 2x4 design and split each bar into two (one bar for “same” and one for “different”). We’ll color code the “DesiredResponse” values.
to accomplish this, first we have to re-do the grouping to also group by “DesiredResponse” values
then we add a new aesthetic called “fill” (referring to the bar fill color) to designate the colors by DesiredResponse values
then we need to adjust the position of the bars so that they are not on top of each other - use position=position_dodge() in the geoms (see the note about dodge values in the code comments)
Copy the Solution code below and then look at it piece by piece to understand it

# regroup data to include additional factor
mrot_bysub <- mrot_tib %>% filter(CorrectResponse=="Correct") %>%
  group_by(Participant, Angle, DesiredResponse) %>% 
  dplyr::summarise(
    sub_meanRT = mean(Time)
  ) %>% ungroup()
mrot_summary <- mrot_bysub %>% 
  group_by(Angle,DesiredResponse) %>% 
  dplyr::summarise(
    meanRT = mean(sub_meanRT),
    ci.low = ggplot2::mean_cl_normal(sub_meanRT)$ymin,
    ci.upp = ggplot2::mean_cl_normal(sub_meanRT)$ymax,
  ) %>% ungroup()
# add fill aesthetic as DesiredResponse
# use position_dodge to move the bars and errorbars horizontally so 
# they aren't on top of each other. A dodge value of .9*(distance between levels) is generally a good dodge value. Since "Angle" is stored as a numeric type variable, the distance between levels is 50, so we will use a dodge value of 45. But (as you will see later) if we had stored "Angle" as a factor type variable, then the distance between levels would be 1, and we would use a dodge value of .9
p4a <- mrot_summary %>% 
  ggplot(aes(x = Angle, y = meanRT, ymin=ci.low, ymax=ci.upp, fill=DesiredResponse)) + 
    geom_bar(stat = "identity", position=position_dodge(45)) +
    geom_errorbar(width=10, position=position_dodge(45)) +
    coord_cartesian(ylim = c(0, 4500)) + 
    theme_classic() + labs(title="Correct Trial response time means grouped by DesiredResponse", y = "mean response time (ms)", x = "angle of disparity (degrees)")
p4a

Step 4.1 Line Plot, grouped by “DesiredResponse”

Cool. Now that you’ve figured that out, try the same thing but as a line plot.
Instead of fill, use the color aesthetic.
Use a smaller dodge value (or no dodge value at all) now that you don’t have thick bars. If you specify a dodge value, you’ll need to do so for each of the three geom elements (geom_point, geom_line, and geom_errorbar)

mrot_bysub <- mrot_tib %>% drop_na(Time) %>%
  group_by(Participant, Angle, DesiredResponse) %>% 
  dplyr::summarise(
    sub_meanRT = mean(Time)
  ) %>% ungroup()
mrot_summary <- mrot_bysub %>% 
  group_by(Angle,DesiredResponse) %>% 
  dplyr::summarise(
    meanRT = mean(sub_meanRT),
    ci.low = ggplot2::mean_cl_normal(sub_meanRT)$ymin,
    ci.upp = ggplot2::mean_cl_normal(sub_meanRT)$ymax,
  ) %>% ungroup()
p4b <- mrot_summary %>% 
  ggplot(aes(x = Angle, y = meanRT, ymin=ci.low, ymax=ci.upp, color=DesiredResponse)) + 
    geom_point(stat = "identity", position=position_dodge(2)) +
    geom_line(aes(group=DesiredResponse), position=position_dodge(2)) +
    geom_errorbar(width=10, position=position_dodge(2)) +
    coord_cartesian(ylim = c(0, 4500)) + 
    theme_classic() + labs(title="Correct trial response time grouped by DesiredResponse", y = "mean response time (ms)", x = "angle of disparity (degrees)")
p4b

Step 5 - Visualize a relation between two variables

Is reaction time related to accuracy?
Let’s compute %correct for each subject, by counting the proportion of “Correct” values in the “CorrectResponse” column. We exclude trials with no response. Alternatively, you might have reasons to include these trials as incorrect for the accuracy calculation, but for this activity we will exclude them.

mrot_bysub <- mrot_tib %>% drop_na() %>% 
  group_by(Participant) %>% 
  dplyr::summarise(
    sub_meanRT = mean(Time),
    sub_accuracy = sum(CorrectResponse=="Correct")/n()
  ) %>% ungroup()

Scatter plot of response time by accuracy

now we can specify the x and y axes, and use geom_point() to plot a point for each subject with accuracy on the x-axis and response time on the y-axis
we can add a regression line by adding a layer with geom_smooth(method = "lm", formula = "y ~ x") (the intercept is automatically included). If you want to fit a quadratic line use formula = "y ~ x + I(x^2)" or formula = "y ~ poly(x,2)" instead.
the shaded area shows the 95% confidence band around the regression line, you can turn it off with se = FALSE

p5a <- mrot_bysub %>% 
  ggplot(aes(x = sub_accuracy, y = sub_meanRT)) + 
    xlim(.6,1) + # this controls the range of data values for the geom_smooth line
    geom_point(stat="identity") +
    geom_smooth(method = "lm", formula = "y ~ x", se = TRUE, fullrange=TRUE) + #intercept automatically included
    coord_cartesian(ylim = c(0, 6000)) + 
    theme_classic() + labs(title="scatter", y = "mean response time (ms)", x = "accuracy")
p5a

Scatter plot of response time by accuracy, grouped by DesiredResponse

What if we wanted to split up subject RT and accuracy by DesiredResponse (whether the shapes were same or different)?
just (a) recompute the subject-level means, grouping by DesiredResponse, and (b) add a color mapping aesthetic color=DesiredResponse in the aes() part of the ggplot code

mrot_bysub <- mrot_tib %>% drop_na() %>% 
  group_by(Participant, DesiredResponse) %>% 
  dplyr::summarise(
    sub_meanRT = mean(Time),
    sub_accuracy = sum(CorrectResponse=="Correct")/n()
  ) %>% ungroup()

## `summarise()` has grouped output by 'Participant'. You can override using the
## `.groups` argument.

p5b <- mrot_bysub %>% 
  ggplot(aes(x = sub_accuracy, y = sub_meanRT, color = DesiredResponse)) + 
    geom_point(stat="identity") +
    geom_smooth(method = "lm", formula = "y ~ x", se = FALSE, fullrange=TRUE) + #intercept automatically included
    coord_cartesian(ylim = c(0, 6000), xlim = c(.6,1)) + 
    xlim(.6,1) + # this controls the range of data values for the geom_smooth line
    theme_classic() + labs(title="scatter grouped by DesiredResponse", y = "mean response time (ms)", x = "accuracy")
p5b

Step 6. Alternatives to visualise variability

Here are some alternative approaches to visualizing variability. Take a look at each and think about possible uses. Try making the code work in your markdown doc if you are interested.

Some journals (e.g., “Kick the bar chart habit” | Nature Methods) have started to discourage plots (such as bar charts) that don’t show data variability well. Here are some alternatives you can use:

1. Use a box plot instead of plotting means:

a box plot shows:
- the median (line)
- a box spanning the interquartile range (IQR=25th to 75th percentile)
- “whiskers” extending 1.5*(IQR) past the box edges (specifically, to the most extreme data points that fall within 1.5*IQR)
- points for individual extreme values beyond the whiskers
notice that we have to re-specify the x-axis (Angle) variable as a factor to get geom_boxplot to work split the data by Angle mutate(Angle=forcats::as_factor(Angle)) – this changes the units of the x-axis (previously 50 units between points, now 1 unit between points), which affects values such as the position_dodge value (was 45 in the bar chart example, now it is .9)
notice that we pass in subject level means, instead of group means
also notice that we have to expand the ylim range to see all outliers

# regroup data 
mrot_bysub <- mrot_tib %>% mutate(Angle=forcats::as_factor(Angle)) %>%
  drop_na(Time) %>% 
  group_by(Participant, Angle, DesiredResponse) %>% 
  dplyr::summarise(
    sub_meanRT = mean(Time)
  ) %>% ungroup()

p6a <- mrot_bysub %>% 
  ggplot(aes(x = Angle, y = sub_meanRT, fill=DesiredResponse)) + 
    geom_boxplot(position=position_dodge(.9)) +
    coord_cartesian(ylim = c(0, 6000)) + 
    theme_classic() + labs(title="Response time grouped box plot", y = "mean response time (ms)", x = "angle of disparity (degrees)")
p6a

2. Use a violin plot layer under plotted means

a violin plot shows:
- a (smoothed) probability density across the full range (width=probability of a given value)
- you can add anything you like on top, here we add means and CI
again, we pass in subject level means for geom_violin
again, we use an expanded ylim range
for geom_point() and geom_errorbar this time we specify the group summarized means and CI

# group data 
mrot_bysub <- mrot_tib %>% mutate(Angle=forcats::as_factor(Angle)) %>%
  drop_na(Time) %>% 
  group_by(Participant, Angle, DesiredResponse) %>% 
  dplyr::summarise(
    sub_meanRT = mean(Time)
  ) %>% ungroup()

mrot_summary <- mrot_bysub %>% 
  group_by(Angle,DesiredResponse) %>% 
  dplyr::summarise(
    meanRT = mean(sub_meanRT),
    ci.low = ggplot2::mean_cl_normal(sub_meanRT)$ymin,
    ci.upp = ggplot2::mean_cl_normal(sub_meanRT)$ymax,
  ) %>% ungroup()
p6b <- mrot_bysub %>% 
  ggplot(aes(x = Angle, y = sub_meanRT, fill=DesiredResponse)) + 
    geom_violin(position = position_dodge(.9)) +
    geom_point(data=mrot_summary, aes(x = Angle, y = meanRT), 
               stat="identity", position = position_dodge(.9)) +
    geom_errorbar(data=mrot_summary, 
                  aes(x = Angle, y = meanRT, ymin=ci.low, ymax=ci.upp),
                  stat="identity", width=.2, position_dodge(.9)) +
    coord_cartesian(ylim = c(0, 6000)) + 
    theme_classic() + labs(title="Response time grouped violin plot w/ mean and CI", y = "mean response time (ms)", x = "angle of disparity (degrees)")
p6b

3. Show all the points: Use a 1D scatter with mean and error bars on top

sometimes it makes sense to show all individual points- here we use a 1D scatter (often called a stripplot), randomly jittered to make the points more visible
again, we pass in subject level means
again, we use an expanded ylim range
again, we specify the x-axis (Angle) variable as a factor
use alpha = (with a value from 0 to 1) to control transparency of any element (used here to make the points less prominent- notice that it changes the points in the legend too, there’s a fix for that in the next example).
we do the mean and error bars just like in the previous example using group-summarized data

# regroup data 
mrot_bysub <- mrot_tib %>% mutate(Angle=forcats::as_factor(Angle)) %>%
  drop_na(Time) %>% 
  group_by(Participant, Angle, DesiredResponse) %>% 
  dplyr::summarise(
    sub_meanRT = mean(Time)
  ) %>% ungroup()

p6c <- mrot_bysub %>% 
  ggplot(aes(x = Angle, y = sub_meanRT, color=DesiredResponse,
             shape=DesiredResponse)) + 
    geom_jitter(position=position_jitterdodge(.75), alpha = .5) +
    geom_point(data=mrot_summary, aes(x = Angle, y = meanRT),color="black",
               stat="identity", position = position_dodge(.75), 
               show.legend = FALSE) +
    geom_errorbar(data=mrot_summary, 
                  aes(x = Angle, y = meanRT, ymin=ci.low, ymax=ci.upp),
                  color="black", stat="identity", width=.2, 
                  position_dodge(.75), show.legend = FALSE) +
    coord_cartesian(ylim = c(0, 6000)) + 
    theme_classic() + 
    labs(title="grouped 1D scatter plot of Response time", 
         y = "mean response time (ms)", 
         x = "angle of disparity (degrees)")
p6c

3b. Too cluttered? Try these variations

Sina Plot

requires the “ggforce” library
geom_sina groups points to approximate the distribution, so the individual points make the shape of a violin plot
again, we specify the x-axis (Angle) variable as a factor
To make it easier to look at, adjust any of the following:
the “alpha” property controls transparency
“size” controls point size
the guides() layer controls the legend
the position_dodge() value controls spacing of the groups
Color:
- scale_color_manual()(for points and lines)
- scale_fill_manual() (for inside of shapes)
- adjust font sizes for all elements easily by setting base_size in the theme layer (default is 11 point)
- this example uses a color palette from the “wesanderson” package (requires that you install.packages("wesanderson"))
  - see this page or this one for color use examples

library(ggforce)
# regroup data 
mrot_bysub <- mrot_tib %>% mutate(Angle=forcats::as_factor(Angle)) %>%
  drop_na(Time) %>% 
  group_by(Participant, Angle, DesiredResponse) %>% 
  dplyr::summarise(
    sub_meanRT = mean(Time)
  ) %>% ungroup()

p6d <- mrot_bysub %>% 
  ggplot(aes(x = Angle, y = sub_meanRT, color=DesiredResponse,
             shape=DesiredResponse)) + 
    geom_sina(position=position_dodge(.6),alpha=.25, size = 1) +
    geom_point(data=mrot_summary, aes(x = Angle, y = meanRT,
                                     color=DesiredResponse),
              position=position_dodge(.6), show.legend = FALSE) +
    geom_errorbar(data=mrot_summary, aes(x = Angle, y = meanRT,
                                         ymin=ci.low, ymax=ci.upp,
                                         color=DesiredResponse),
                  stat="identity", width=.4, alpha = 1,
                  position_dodge(.6), show.legend = FALSE) +
    coord_cartesian(ylim = c(0, 6000)) + 
    theme_classic(base_size = 14) + 
    labs(title="grouped sina plot w/ means and CIs", 
         y = "mean response time (ms)", 
         x = "angle of disparity (degrees)") +
    guides(color = guide_legend(override.aes = list(alpha=1, size =2))) +
    scale_color_manual(values = wesanderson::wes_palette("Darjeeling1", n = 2))

p6d

4. Still want more? Check out `pirateplot()` in the “yarrr” package:

bar shows the mean
box around the bar shows conf interval
violin shows probability density
error bars show IQR
points show raw data

# regroup data 
mrot_bysub <- mrot_tib %>% mutate(Angle=forcats::as_factor(Angle)) %>% drop_na(Time) %>% 
  group_by(Participant, Angle, DesiredResponse) %>% 
  dplyr::summarise(
    sub_meanRT = mean(Time)
  ) %>% ungroup()
p6d <- mrot_bysub %>% 
  yarrr::pirateplot(data = ., sub_meanRT ~ Angle + DesiredResponse,
                    ylab = "Response Time(ms)", 
                    quant = c(.25, .75),
                    inf.method = 'ci')
p6d

## NULL

Last Step - Saving your plots as image files

if you want to save your plots, and control the file type, size, and resolution (e.g., to submit with a publication, or paste into a word doc) you can use the ggsave() function (for some file types you may need to install additional packages)
below is an example of how to save a plot we created above as a *.png file with 300 dpi resolution, 6in X 4in
if you intend to put multiple plots together to make a figure (e.g., in Adobe Illustrator, Inkscape, Powerpoint, …) it is recommended to use the *.svg file type (device = “svg”) to preserve the resolution of the chart components.

# notice that the width/height parameters will change the size of the image but the size of components in the chart (like point, text, etc.) will stay the same
# you will get an error with this code if the folder "images" does not exist in your project folder
ggsave(filename = "images/scatter1D.png", plot = p6c, device = "png", width = 6, height = 4, units = "in", dpi = 300)

References

parts of this activity were based on the PsyTeachR unit on data visualization developed at the University of Glasgow School of Psychology and Institute of Neuroscience and Psychology.
official documentation for ggplot2
R-Graph-Gallery examples
R for Data Science
Nordmann et al (preprint). “Data visualisation using R, for researchers who don’t use R” https://doi.org/10.31234/osf.io/4huvw
datanovia blog, “Top R color palettes,” by user Alboukadel
R Cookbook section on colors