Search code examples
rsimulationlme4mixed-models

How to simulate lmer dataset for power analysis using simr?


I have been reading up on how to do power analysis in simr, but it seems that the only way to do a priori power analysis with mixed effects models is by using a simulated dataset. The vignette and journal article on it aren't very clear how you achieve this. As such, I want to create a dataset that resembles what I want in my final mixed effects model, but I'm a bit confused on how to do it.

Piggybacking off a previous question I had, this dataset is a fake dataset which includes the following variables:

  • wr: word reading
  • va: verbal ability
  • iq: intelligence quotient
  • id: subject id (student)
  • school: school of subject
  • letter: letter they are supposed to read
  • value: a rank of the letter (from 1 to 40)

Essentially what I want is to predict how much letter values moderate the influence of iq and verbal ability on reading. I also want to look at the variation of school and student ability, which should vary by iq and verbal ability. To do that, I made this dataset:

#### Libraries ####
library(tidyverse)
library(lmerTest)

#### Variables ####
set.seed(123) # set seed for reproducibility
id <- 1:300 # participants (students)
iq <- rnorm(n=300, mean=120, sd=15) # iq variable
va <- iq*.50 + rnorm(n=300, mean=117, sd = 18) # verbal ability
wr <- iq*.50 + rnorm(n=300, mean=115, sd = 16) # word reading
school <- c("school1","school2","school3","school4") # school of student
df <- data.frame(id,iq,va,wr,school)

#### Join Data with Value and Letters ####
value <- 1:40

df2 <- df %>% 
  inner_join(crossing(id = 1:300,
                      letter = letters[1:20])) %>% 
  mutate(value = rep(value, 150))

This in turn gives me this data frame:

 id       iq       va       wr  school letter value
1  1 111.5929 159.9221 187.9806 school1      a     1
2  1 111.5929 159.9221 187.9806 school1      b     2
3  1 111.5929 159.9221 187.9806 school1      c     3
4  1 111.5929 159.9221 187.9806 school1      d     4
5  1 111.5929 159.9221 187.9806 school1      e     5
6  1 111.5929 159.9221 187.9806 school1      f     6

I already sense there are already issues with this, but I've run the model just to look at what it does.

lmer.df <- lmer(wr
                ~ (iq+va)*value
                + (1+iq+va|school/id),
                data = df2)
summary(lmer.df)

The output doesn't look like it modeled things well, though I realize this is a complex model as well:

boundary (singular) fit: see help('isSingular')
Warning message:
Some predictor variables are on very different scales: consider rescaling 

How do I better emulate what I want for the power analysis?


Solution

  • Just so this question has an answer people can look at in the future, I had this question answered on Cross Validated instead for those interested. The example code given by the person who answered is one way of achieving this:

    require(lme4)
    
    # Design
    nparticipants <- 200
    ntrials <- 1000
    
    # simulate data
    set.seed(19483093)
    X <- data.frame(
      vowel_count = runif(ntrials, 1, 1000),
      person = sample(1:nparticipants, size = ntrials, replace = TRUE)
    )
    X$IQ <- runif(nparticipants, 80, 150)[X$person]
    X$visual_motor_skills <- runif(nparticipants, 0, 100)[X$person]
    X$person_error <- rnorm(nparticipants, 0, 0.5)[X$person]
    
    # True model
    X$accuracy <- with(X, plogis(((IQ-100)/100 - (visual_motor_skills-50)/100) * (vowel_count - 500)/1000 + person_error + rnorm(ntrials, 0, 0.5)))
    
    # Model fitting with centered and scaled covariates
    X$model_IQ <- (X$IQ - mean(X$IQ)) / sd(X$IQ)
    X$model_vms <- (X$visual_motor_skills - mean(X$visual_motor_skills)) / sd(X$visual_motor_skills)
    X$model_vowel_count <- (X$vowel_count - mean(X$vowel_count)) / sd(X$vowel_count)
    
    # random intercept for each person
    #    This model ignores the fact that IQ and visual motor skills are only measured once for each person
    lmer1 <- lme4::lmer(accuracy ~ model_IQ*model_vowel_count + 
                        model_vms*model_vowel_count + 
                        (1 | person), data = X)
    summary(lmer1)
    plot(lmer1)
    
    # random intercept for each person and random slopes for IQ and visual motor skills
    #   Model without the logistic transform has singularity issues
    lmer2 <- lme4::lmer(qlogis(accuracy) ~ model_IQ*model_vowel_count + 
                        model_vms*model_vowel_count + 
                        (1 + model_IQ + model_vms | person), data = X)
    summary(lmer2)
    plot(lmer2)