Nick’s data - Positive autobiographical memories in cigarette smokers.

cigarette smokers recalled positive memories in an interview format, for each memory they rated:
- how long ago the event occurred (ordinal, higher values are more recent memories)
- if the memory involved cigarette smoking
- how frequently they think about the memory
- how (negative-positive) they feel while recalling the memory
- how vivid the memory is

library("tidyverse")

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library("lme4")

## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

mem_init_tib <- read_csv("data/mem_initial_ratings.csv") |> mutate(id = as.character(id))

## Rows: 522 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): mem_description, mem_keyphrase
## dbl (9): id, memory_number, mem_time, mem_smoking, mem_valence, mem_intensit...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Hypothesis: In cigarette smokers, the vividness of a memory is related to (a) how old it is, (b) if it involves smoking, and (c) if it is frequently recalled.

First let’s look at the vividness ratings (the DV)

quants <- quantile(mem_init_tib$mem_vividness, c(.25,.5,.75), na.rm = TRUE)
vividness_mean <- mean(mem_init_tib$mem_vividness, na.rm = TRUE)
p1 <- mem_init_tib %>% drop_na(mem_vividness) %>%
  ggplot( aes(x=mem_vividness)) + geom_histogram(binwidth=1) + 
    geom_vline(xintercept = quants, colour="red", linetype = "dashed") +
    geom_vline(xintercept = vividness_mean, colour="black") +
    theme_classic() + labs (title = "Mem vividness distribution- all individuals")
p2 <- mem_init_tib %>% drop_na(mem_vividness) %>%
    ggplot( aes(sample=mem_vividness)) + geom_qq() + geom_qq_line() + theme_classic() +
      labs (title = "Mem vividness Q-Q- all individuals")
p1; p2

#### Now the time variable (how old is the memory?)

quants <- quantile(mem_init_tib$mem_time, c(.25,.5,.75), na.rm = TRUE)
time_mean <- mean(mem_init_tib$mem_time, na.rm = TRUE)
p1 <- mem_init_tib %>% drop_na(mem_time) %>%
  ggplot( aes(x=mem_time)) + geom_histogram(binwidth=1) + 
    geom_vline(xintercept = quants, colour="red", linetype = "dashed") +
    geom_vline(xintercept = time_mean, colour="black") +
    theme_classic() + labs (title = "Mem time distribution- all individuals")
p2 <- mem_init_tib %>% drop_na(mem_time) %>%
    ggplot( aes(sample=mem_time)) + geom_qq() + geom_qq_line() + theme_classic() +
      labs (title = "Mem time Q-Q- all individuals")
p1; p2

Now the frequency variable (how often do you think about the memory?)

quants <- quantile(mem_init_tib$mem_frequency, c(.25,.5,.75), na.rm = TRUE)
frequency_mean <- mean(mem_init_tib$mem_frequency, na.rm = TRUE)
p1 <- mem_init_tib %>% drop_na(mem_frequency) %>%
  ggplot( aes(x=mem_frequency)) + geom_histogram(binwidth=1) + 
    geom_vline(xintercept = quants, colour="red", linetype = "dashed") +
    geom_vline(xintercept = frequency_mean, colour="black") +
    theme_classic() + labs (title = "Mem frequency distribution- all individuals")
p2 <- mem_init_tib %>% drop_na(mem_frequency) %>%
    ggplot( aes(sample=mem_frequency)) + geom_qq() + geom_qq_line() + theme_classic() +
      labs (title = "Mem frequency Q-Q- all individuals")
p1; p2

Fit the full model: vividness explained by time, smoking, and frequency

we get a singular fit warning - therefore, the model fit results are not interpretable.
in linear mixed models, a singular fit occurs when random effect variance is estimated very near zero or collinearity between random effects is very near 1 or -1. This can occur when the data is not sufficiently informative
in this case the mem_frequency random effect (by-subject slope) is highly negatively correlated with the by-subject intercept (possibly due to lack of variance in the mem_frequency measure)
we can examine the mem_frequency variable more closely to see this (Q: how did we know that this might be the problem? A: During data collection we suspected that some participants did not understand this measure fully)

model1   = lmer(mem_vividness ~ mem_time + mem_smoking + mem_frequency + (mem_time + mem_smoking + mem_frequency| id), data=mem_init_tib)

## boundary (singular) fit: see help('isSingular')

summary(model1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: mem_vividness ~ mem_time + mem_smoking + mem_frequency + (mem_time +  
##     mem_smoking + mem_frequency | id)
##    Data: mem_init_tib
## 
## REML criterion at convergence: 1216.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.1005 -0.3896  0.0838  0.5239  2.5855 
## 
## Random effects:
##  Groups   Name          Variance Std.Dev. Corr             
##  id       (Intercept)   1.153553 1.07404                   
##           mem_time      0.005238 0.07237  -0.69            
##           mem_smoking   0.035721 0.18900  -0.57  0.11      
##           mem_frequency 0.015915 0.12615  -1.00  0.68  0.50
##  Residual               0.523723 0.72369                   
## Number of obs: 522, groups:  id, 17
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept)    3.29191    0.30238  10.886
## mem_time       0.08442    0.02875   2.936
## mem_smoking    0.08194    0.08169   1.003
## mem_frequency  0.22672    0.04189   5.412
## 
## Correlation of Fixed Effects:
##             (Intr) mem_tm mm_smk
## mem_time    -0.580              
## mem_smoking -0.599  0.192       
## mem_freqncy -0.785  0.241  0.238
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

#plot individual frequency ratings by subject
#mem_init_tib |> ggplot(aes(x = id, y = mem_frequency)) +
#  geom_jitter(shape = 1, width = .1, height = 0)

so we will delete frequency from the model

model1   = lmer(mem_vividness ~ mem_time + mem_smoking  + (mem_time + mem_smoking | id), data=mem_init_tib, control = lmerControl(optimizer = "bobyqa"))

summary(model1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: mem_vividness ~ mem_time + mem_smoking + (mem_time + mem_smoking |  
##     id)
##    Data: mem_init_tib
## Control: lmerControl(optimizer = "bobyqa")
## 
## REML criterion at convergence: 1279.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.7733 -0.3443  0.1148  0.5240  2.7004 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr       
##  id       (Intercept) 1.028609 1.01420             
##           mem_time    0.006202 0.07875  -0.79      
##           mem_smoking 0.071004 0.26647  -0.80  0.40
##  Residual             0.596518 0.77235             
## Number of obs: 522, groups:  id, 17
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  3.83087    0.28622  13.384
## mem_time     0.10335    0.03070   3.366
## mem_smoking  0.05957    0.09677   0.616
## 
## Correlation of Fixed Effects:
##             (Intr) mem_tm
## mem_time    -0.696       
## mem_smoking -0.776  0.308

smoking is not informative, but time explains vividness

recent memories are more vivid
to do a significance test, let’s compare a model with mem_time to a model with that fixed effect removed (using R’s anova() function)

d <- mem_init_tib
baseline  <- lmer(mem_vividness ~ 1 + (mem_time | id), data=d, REML = FALSE)
model1   = lmer(mem_vividness ~ mem_time  + (mem_time | id), data=d, control = lmerControl(optimizer = "bobyqa"))

summary(model1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: mem_vividness ~ mem_time + (mem_time | id)
##    Data: d
## Control: lmerControl(optimizer = "bobyqa")
## 
## REML criterion at convergence: 1284.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.8236 -0.2973  0.1131  0.5379  2.0480 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  id       (Intercept) 0.459648 0.67797       
##           mem_time    0.004492 0.06702  -0.75
##  Residual             0.614801 0.78409       
## Number of obs: 522, groups:  id, 17
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  3.92912    0.18328  21.438
## mem_time     0.10009    0.02872   3.485
## 
## Correlation of Fixed Effects:
##          (Intr)
## mem_time -0.706

anova(model1,baseline)

## refitting model(s) with ML (instead of REML)

## Data: d
## Models:
## baseline: mem_vividness ~ 1 + (mem_time | id)
## model1: mem_vividness ~ mem_time + (mem_time | id)
##          npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)   
## baseline    5 1295.8 1317.1 -642.91   1285.8                       
## model1      6 1289.0 1314.5 -638.50   1277.0 8.812  1   0.002993 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Extra - mixed effects logistic model

does the age of a memory explain whether it is smoking or not?
see model below - more recent memories are likely to involve smoking

d <- mem_init_tib |> filter(mem_smoking<3) 
d <- d |> mutate(mem_smoking = 2 - mem_smoking)
smoke_model  <- glmer(
  mem_smoking ~ mem_time + (mem_time |id), 
  family = "binomial", data=d)
summary(smoke_model)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: mem_smoking ~ mem_time + (mem_time | id)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    672.5    693.7   -331.2    662.5      512 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6635 -0.9400  0.4058  0.8195  1.3530 
## 
## Random effects:
##  Groups Name        Variance Std.Dev. Corr 
##  id     (Intercept) 1.9567   1.399         
##         mem_time    0.1274   0.357    -0.96
## Number of obs: 517, groups:  id, 17
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  -0.7010     0.4227  -1.658  0.09724 . 
## mem_time      0.3339     0.1252   2.667  0.00764 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##          (Intr)
## mem_time -0.916

smoke_baseline <- glmer(
  mem_smoking ~ 1 + (mem_time |id), 
  family = "binomial", data=d)
summary(smoke_baseline)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: mem_smoking ~ 1 + (mem_time | id)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    677.7    694.7   -334.8    669.7      513 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5918 -1.0008  0.3952  0.8260  1.3549 
## 
## Random effects:
##  Groups Name        Variance Std.Dev. Corr 
##  id     (Intercept) 3.3674   1.8351        
##         mem_time    0.2235   0.4728   -0.99
## Number of obs: 517, groups:  id, 17
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)   0.4894     0.1514   3.233  0.00122 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(smoke_model,smoke_baseline)

## Data: d
## Models:
## smoke_baseline: mem_smoking ~ 1 + (mem_time | id)
## smoke_model: mem_smoking ~ mem_time + (mem_time | id)
##                npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)   
## smoke_baseline    4 677.70 694.69 -334.85   669.70                        
## smoke_model       5 672.48 693.72 -331.24   662.48 7.2196  1   0.007211 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

performance::r2(smoke_model)

## # R2 for Mixed Models
## 
##   Conditional R2: 0.221
##      Marginal R2: 0.081

SmokingNA-ABMQ

2024-01-29

Nick’s data - Positive autobiographical memories in cigarette smokers.

Fit the full model: vividness explained by time, smoking, and frequency

so we will delete frequency from the model

smoking is not informative, but time explains vividness

Extra - mixed effects logistic model