last edited 2024-02-09
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_knit$set(root.dir = rprojroot::find_rstudio_root_file())
library(tidyverse)
install.packages("effectsize")
pnum
, CONDITION
, and Hire_Rating
(there’s also pnum_rm
but ignore that for now)
pnum
stores participant ids (professional recruiters
sampled from a conference meeting)CONDITION
stores the “audio”/“text” condition
indicating that recruiters heard the pitch (“audio”=1) or read the pitch
as text (“text”=0) - this is our grouping variable
(IV))Hire_Rating
stores the recruiters’ rating of how likely
they would be to hire the candidate (0 = not at all likely, 10 =
extremely likely) (this is our dependent variable
(DV))ratings_tib
. Now click on the tibble in the Environment
window pane to take a quick look (Are your variables the right types?
Are there missing values?).#first import the data
ratings_tib <- readr::read_csv("data/Schroeder_Epley_2015_Study_4_abbrev.csv",
na = "NA")
# now make sure the columns we want as factors are treated that way, using forcats::as_factor()
ratings_tib <- ratings_tib |> dplyr::mutate(
condition_factor = forcats::as_factor(CONDITION)
) |> dplyr::mutate( #let's give names to the levels by using the fct_recode() function
condition_factor = forcats::fct_recode(condition_factor, text = "0", audio = "1")
) #after this you should see "text" and "audio" in the condition_factor column of ratings_tib
# use code below to re-order the levels of condition
# ratings_tib <- ratings_tib |> mutate(
# condition_factor = forcats::fct_relevel(condition_factor,"audio","text")
# )
To jog your memory, here is a link to the “import and examine” activity where we made descriptive tables and distribution plots. In that earlier activity you can go to Step 3.1 for tables, Step 3.2 for plots, and the code in the last “mini-challenge” section has an example of how to add a grouping variable to tables and plots.
#box plot
p1 <- ratings_tib |>
ggplot(aes(x = condition_factor, y = Hire_Rating)) +
geom_boxplot() +
theme_classic() + labs(title="Hire Rating box plot by group", y = "Hire Rating", x = "Group")
#histogram
p2 <- ratings_tib |>
ggplot( aes(x = Hire_Rating, fill = condition_factor)) +
geom_histogram(position = "identity", alpha = .5, binwidth = 1) +
theme_classic() + labs (title = "Hire Rating Distribution by Group")
#Extra: q-q plots for each group (using facets - see https://ggplot2-book.org/facet.html)
p3 <- ratings_tib |> group_by(condition_factor) |>
ggplot( aes(sample=Hire_Rating)) + geom_qq() + geom_qq_line() + theme_classic() +
facet_wrap(~ condition_factor) +
labs (title = "Q-Q of Hire Rating")
#table
ratings_tib |> group_by(condition_factor) |>
dplyr::summarise(
median = median(Hire_Rating),
mean = mean(Hire_Rating),
sd = sd(Hire_Rating),
cases = n() - sum(is.na(Hire_Rating))
) |>
knitr::kable(caption = "Hire Rating Descriptives by Group", digits = 3) |>
kableExtra::kable_styling(full_width = FALSE)
p1; p2; p3
condition_factor | median | mean | sd | cases |
---|---|---|---|---|
text | 2 | 2.889 | 2.055 | 18 |
audio | 5 | 4.714 | 2.261 | 21 |
t.test()
- you can find the
documentation for the function by searching for it in the “Help” tab of
your bottom right window paneHire_Rating ~ condition_factor
) – alternatively (but much
harder to read) you could instead specify two separate vectors
x = filter(ratings_tib,condition_factor=="audio")$Hire_Rating, y = filter(ratings_tib,condition_factor=="text")$Hire_Rating
|>
or %>%
to pipe
a tibble into the function then use data = _
for
|>
or data = .
for %>%
,
otherwise use data = ratings_tib
) - omit this if you use
the x =
, y =
form.alternative = "two.sided"
)var.equal = FALSE
)paired = FALSE
)# independent samples t-test (equal variance not assumed)
t.test(formula = Hire_Rating ~ condition_factor, data = ratings_tib,
alternative = "two.sided", var.equal = FALSE, paired = FALSE)
# or equivalently, using x and y vectors instead of specifying with a formula:
# t.test(x = subset(ratings_tib$Hire_Rating,ratings_tib$condition_factor=="text"),
# y = subset(ratings_tib$Hire_Rating,ratings_tib$condition_factor=="audio"),
# alternative = "two.sided",
# var.equal = FALSE, paired = FALSE)
##
## Welch Two Sample t-test
##
## data: Hire_Rating by condition_factor
## t = -2.6399, df = 36.856, p-value = 0.01208
## alternative hypothesis: true difference in means between group text and group audio is not equal to 0
## 95 percent confidence interval:
## -3.2265958 -0.4241979
## sample estimates:
## mean in group text mean in group audio
## 2.888889 4.714286
Welch Two Sample t-test
is the name for an
independent samples t-test that allows for unequal variances in each
group (equivalent to the SPSS independent samples t-test “Equal variance
not assumed” row)
The t-statistic is ratio of the mean difference divided by estimated standard error (essentially the sum of standard error of each mean, though the formula and degrees of freedom are modified to account for unequal variances - see Chapter 10 of the Field textbook), and the p-value tells you the probability of a t-statistic this far from zero under the null hypothesis (notice the degrees of freedom is not an integer - this is due to the modification allowing for unequal variance)
The sign of the t-statistic (positive or negative) is determined
by the order of levels of the IV: first level (“text”) minus second
level (“audio”). If you wanted to reverse that, you could re-order the
levels like this:
ratings_tib <- ratings_tib |> dplyr::mutate(conditions_factor = forcats::fct_relevel(conditions_factor,"audio","text"))
- go ahead and do that now (it makes most sense to do it in the chunk
where you import the data, then re-run all the chunks)
The 95% confidence interval gives an interval around the estimated difference between means: we expect 95% of intervals constructed this way to contain the true difference in population means
The sample means should match what you saw in your table above.
In general, effect size estimates for two group comparisons are
simply the difference between means expressed in standardized units.
This effect size measure is referred to as Cohen’s d. As you read in the
Lakens article, there are different methods to calculate d. The
effectsize
package can give us these measures - here we
will used the pooled variance version of d (ds in the Lakens
article).
use effectsize::cohens_d()
to compute the effect
size estimate. you’ll need to specify:
Hire_Rating ~ condition_factor
) - this function does not
accept formula = ...
, instead you must instead specify the
formula as the first argument (see the solution code if this doesn’t
make sense). How could you know that? - the info is in the “help”
documentation for the function (look under the “Help” tab in
RStudio)data = ratings_tib
, or if piping the data:
data = _
for |>
or data = .
for %>%
)pooled_sd = TRUE
)paired = FALSE
)alternative = "two.sided"
to request a two sided
confidence interval around the estimateknitr::kable()
(like in the
solution example)if you are curious about Hedge’s g (which should be less biased
for small samples like this one) use effectsize::hedges_g()
with exactly the same arguments that you supplied for cohen’s d
the sign of the effect size is also determined by the order of levels in the IV (as mentioned above for the t-test)
try it now in your Rmd doc
ratings_tib <- ratings_tib |> mutate(
condition_factor = forcats::fct_relevel(condition_factor,"audio","text")
)
# pooled variance cohen's d
effectsize::cohens_d(Hire_Rating ~ condition_factor, data = ratings_tib, pooled_sd = TRUE,
paired = FALSE, alternative = "two.sided") |>
knitr::kable(caption = "Effect size Cohen's d (pooled)", digits = 3) |>
kableExtra::kable_styling(full_width = FALSE)
# pooled variance hedge's g
# effectsize::hedges_g(Hire_Rating ~ condition_factor, data = ratings_tib, pooled_sd = TRUE,
# paired = FALSE, alternative = "two.sided") |>
# knitr::kable(caption = "Effect size Hedge's g", digits = 3) |>
# kableExtra::kable_styling(full_width = FALSE)
Cohens_d | CI | CI_low | CI_high |
---|---|---|---|
0.842 | 0.95 | 0.178 | 1.494 |
Hire_rating
scale)?A report of the result of a comparison like this should
include:
1. Type of comparison (this can be in the Methods section. e.g. “Means
were compared by an independent samples t-test (equal variances not
assumed).”)
2. Means and SDs for each group (you might also report a confidence
interval for each mean and/or for the difference between means)
3. Test statistic with degrees of freedom and p value
4. Effect size
Following the way the result was reported in the original publication
(Schroeder &
Epley, 2015) we could report the finding like this:
“Recruiters reported being more likely to hire the candidates when they
listened to pitches (M = 4.71, SD = 2.26) than when they read the same
pitches (M = 2.89, SD = 2.06), t(36.87) = 2.64, p = .01, 95% CI of the
difference = [0.42, 3.23], d = 0.84.”
You may, in some papers, see a “Bayes Factor” reported along with the
typical t-test information. In this two group mean comparison case, a
Bayes Factor tells us how much more likely the alternative hypothesis is
(compared to the null), given the observed difference in means in the
sample. You can think of it as odds in favor of the alternative
hypothesis. To calculate a Bayes factor for a t-test, you can use the
ttestBF
function in the BayesFactor
package,
syntax is like the regular t.test
function. Here is example
code if you want to try it:
BayesFactor::ttestBF( formula = Hire_Rating ~ condition_factor, paired = FALSE, data = ratings_tib )
- the Bayes Factor is displayed after the r (this is an r scale value,
not a correlation coefficient, it controls the width of the alterative
distribution and .707 is a standard value).
wilcox.test()
base R function. This function requires
similar arguments compared to the t.test()
function:
formula = Hire_Rating ~ condition_factor
)|>
or %>%
to pipe
a tibble into the function then use data = _
for
|>
or data = .
for %>%
,
otherwise use data = ratings_tib
)alternative = "two.sided"
)paired = FALSE
) - the paired Wilcoxon is called the
“signed rank” test and is not equivalent to the Mann-Whitney test.Try it now in your R markdown doc.
# independent samples Wilcoxon rank sum test
wilcox.test(formula = Hire_Rating ~ condition_factor, data = ratings_tib,
paired = FALSE, alternative = "two.sided")
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: Hire_Rating by condition_factor
## W = 275.5, p-value = 0.01441
## alternative hypothesis: true location shift is not equal to 0
WRS2::yuen()
function).(r = z/sqrt(N))
.wilcox.test()
internally computed a z-statistic but
didn’t include it in the output. The simplest way to get the z-stat is
to work backwards from the p-value using the qnorm()
function to get the corresponding z-stat from the normal distribution.
We can access the exact (not rounded) p-value by first storing the
result of wilcox.test()
in a variable called
Wtest
, and then selecting the p-value like this :
Wtest$p.value
.WtestZ <- abs(qnorm(Wtest.pvalue/2))
and
WtestR <- WtestZ/sqrt(N)
(note that there is no pos/neg
sign on the effect size computed this way).wilcox.test()
function applies (you can change this by
specifying correct = FALSE
when you call the function)# independent samples Wilcoxon rank sum test
Wtest <- wilcox.test(formula = Hire_Rating ~ condition_factor, data = ratings_tib,
paired = FALSE, distribution = "exact",
alternative = "two.sided")
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
WtestZ <- abs(qnorm(Wtest$p.value/2))
WtestN <- nrow(ratings_tib |> drop_na(Hire_Rating,condition_factor))
WtestR <- WtestZ/sqrt(WtestN)
## the line below puts the Z and R values into a formatted output
cat(sprintf("\nEffect size for rank sum comparison:\nZ = %.3f, r = %.3f", WtestZ, WtestR))
##
## Effect size for rank sum comparison:
## Z = 2.447, r = 0.392
Now, let’s imagine a different sample of data, where each recruiter
was exposed to both conditions (audio and text) and made the same
ratings, so from each recruiter there are two Hire_Rating
measures: one in the audio and one in the text condition. This is a
within-subjects or repeated measures design. When we analyze this sample
we need to account for the fact that measures in each condition are
dependent, meaning that the “audio” rating from a recruiter may
be related to the “text” rating from the same recruiter.
Hypothetically, imagine that the dataset we have been working with
was a within-subjects design, and our participant identifier is stored
in the pnum_rm
column instead of pnum
. There
were 3 more recruiters in the “audio” condition than the “text”
condition so we have to drop 3 points (these are the rows with NA values
for pnum_rm
) so that we have 18 pairs of data points.
Below, the plot on the left is the same data we’ve been working with
(with the 3 points dropped), and the plot on the right has lines
connecting the pairs of points.
pnum_rm
value are dropped:ratings_rm_tib <- ratings_tib |> drop_na(pnum_rm)
-
you should have 36 cases in your new tibble.t-test()
function that you used before,
but on the new sample of data, and specify paired = TRUE
(the var.equal
is irrelevant for paired samples)t.test()
matches pairs of values is by the
order they appear in the data (the first “audio” case gets matched with
the first “text” case, and so on – if there are not an equal number of
each group you will get an error).pnum_rm
first, to put the cases are in order (use
dplyr::arrange(pnum_rm)
)pnum_rm
with dplyr::arrange()
- see the
code for how to do this.ratings_rm_tib <- ratings_tib |> drop_na(pnum_rm) |> dplyr::arrange(pnum_rm, condition_factor)
#paired samples t-test
t.test(formula = Hire_Rating ~ condition_factor, data = ratings_rm_tib,
paired = TRUE, alternative = "two.sided")
##
## Paired t-test
##
## data: Hire_Rating by condition_factor
## t = 15, df = 17, p-value = 3.096e-11
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 2.148364 2.851636
## sample estimates:
## mean difference
## 2.5
This is caused by a change between version 0.8.5 and 0.8.6 of the
“effectsize” package. The fix, which will come in 0.8.7 (not available
on CRAN when this was written), will be to use a new function called
repeated_measures_d()
. For now, the solution is to
downgrade your “effectsize” package version to 0.8.5. Use this command
in the console (install the remotes
package if you
don’t already have it):
install_version("effectsize", version = "0.8.5", repos = "http://cran.us.r-project.org")
audio
and text
hiring ratings for each
recruiter (17 degrees of freedom because we estimate one parameter, the
mean difference between conditions).paired = TRUE
in our function call, and we would get
a different (larger) effect size than we got for the first (between
subjects) study - referred to as dz in the Lakens (2013)
article.effectsize::cohens_d()
to compute the effect
size estimate - try it both ways. you’ll need to specify:
Hire_Rating ~ condition_factor
)data = ratings_rm_tib
)paired = FALSE
for #1 or TRUE for #2)pooled_sd = TRUE
for #1, it is irrelevant for #2)alternative = "two.sided"
to get a two sided confidence
intervaltry it now in your Rmd doc
#1
effectsize::cohens_d(Hire_Rating ~ condition_factor, data = ratings_rm_tib,
pooled_sd = TRUE, paired = FALSE, alternative = "two.sided") |>
knitr::kable(caption = "Effect size Cohen's d (pooled)", digits = 3) |>
kableExtra::kable_styling(full_width = FALSE)
#2 - the data must be sorted by pnum_rm, so it doesn't hurt to repeat the arrange() function here
ratings_rm_tib <- ratings_rm_tib |> dplyr::arrange(pnum_rm)
effectsize::cohens_d(Hire_Rating ~ condition_factor, data = ratings_rm_tib,
pooled_sd = FALSE, paired = TRUE, alternative = "two.sided") |>
knitr::kable(caption = "Effect size Cohen's dz (dependent)", digits = 3) |>
kableExtra::kable_styling(full_width = FALSE)
Cohens_d | CI | CI_low | CI_high |
---|---|---|---|
1.353 | 0.95 | 0.617 | 2.073 |
Cohens_d | CI | CI_low | CI_high |
---|---|---|---|
3.536 | 0.95 | 2.263 | 4.793 |
BayesFactor::ttestBF()
function to
calculate a Bayes Factor, this time with paired = TRUE
-
the formula =
syntax does not work for a paired test, so
you will need to use the x =
and y =
options
instead of formula =
and data =
.wilcox.test()
function, the only
change is that we set the “paired” argument to TRUE (`paired =
TRUE).Try it now in your R markdown doc.
Don’t forget to specify ratings_rm_tib as the data, and sort
cases using dplyr::arrange() (especially if you are
copy-pasting code from Step 6).
# paired samples Wilcoxon signed rank test
ratings_rm_tib |>
dplyr::arrange(pnum_rm) |>
wilcox.test(formula = Hire_Rating ~ condition_factor, data = _,
paired = TRUE, alternative = "two.sided")
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
##
## Wilcoxon signed rank test with continuity correction
##
## data: Hire_Rating by condition_factor
## V = 171, p-value = 0.0001377
## alternative hypothesis: true location shift is not equal to 0
WRS2::yuend()
function).(r = z/sqrt(N))
. Just like
in Step 6, we have to compute this with a few lines of code (see
solution). You should find that it is very close (though not exactly the
same) to what you got in the SPSS activity, due a correction that the
wilcox.test()
function applies (you can change this by
specifying correct = FALSE
when you call the
function).# paired samples Wilcoxon signed rank test
Wtest <- ratings_rm_tib |>
dplyr::arrange(pnum_rm) |>
wilcox.test(formula = Hire_Rating ~ condition_factor, data = _,
paired = TRUE, alternative = "two.sided")
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
WtestZ <- abs(qnorm(Wtest$p.value/2))
WtestN <- nrow(ratings_rm_tib |> drop_na(Hire_Rating, condition_factor, pnum_rm))
WtestR <- WtestZ/sqrt(WtestN)
## the line below puts the Z and R values into a formatted output
cat(sprintf("\nEffect size for rank sum comparison:\nZ = %.3f, r = %.3f", WtestZ, WtestR))
##
## Effect size for rank sum comparison:
## Z = 3.812, r = 0.635
Some things to think about and explore:
- How would you depict the findings in the between-subjects design
example? In the within-subjects design example?
- check out Figure
7 to see how Schroeder & Epley
(2015) depicted the between-subjects data in their paper.
- for an example of plotting data from a within-subjects design, check
out Figure
2a in Mehr, Song,
& Spelke (2016)