I have been reading up on how to do power analysis in simr
, but it seems that the only way to do a priori power analysis with mixed effects models is by using a simulated dataset. The vignette and journal article on it aren't very clear how you achieve this. As such, I want to create a dataset that resembles what I want in my final mixed effects model, but I'm a bit confused on how to do it.
Piggybacking off a previous question I had, this dataset is a fake dataset which includes the following variables:
Essentially what I want is to predict how much letter values moderate the influence of iq and verbal ability on reading. I also want to look at the variation of school and student ability, which should vary by iq and verbal ability. To do that, I made this dataset:
#### Libraries ####
library(tidyverse)
library(lmerTest)
#### Variables ####
set.seed(123) # set seed for reproducibility
id <- 1:300 # participants (students)
iq <- rnorm(n=300, mean=120, sd=15) # iq variable
va <- iq*.50 + rnorm(n=300, mean=117, sd = 18) # verbal ability
wr <- iq*.50 + rnorm(n=300, mean=115, sd = 16) # word reading
school <- c("school1","school2","school3","school4") # school of student
df <- data.frame(id,iq,va,wr,school)
#### Join Data with Value and Letters ####
value <- 1:40
df2 <- df %>%
inner_join(crossing(id = 1:300,
letter = letters[1:20])) %>%
mutate(value = rep(value, 150))
This in turn gives me this data frame:
id iq va wr school letter value
1 1 111.5929 159.9221 187.9806 school1 a 1
2 1 111.5929 159.9221 187.9806 school1 b 2
3 1 111.5929 159.9221 187.9806 school1 c 3
4 1 111.5929 159.9221 187.9806 school1 d 4
5 1 111.5929 159.9221 187.9806 school1 e 5
6 1 111.5929 159.9221 187.9806 school1 f 6
I already sense there are already issues with this, but I've run the model just to look at what it does.
lmer.df <- lmer(wr
~ (iq+va)*value
+ (1+iq+va|school/id),
data = df2)
summary(lmer.df)
The output doesn't look like it modeled things well, though I realize this is a complex model as well:
boundary (singular) fit: see help('isSingular')
Warning message:
Some predictor variables are on very different scales: consider rescaling
How do I better emulate what I want for the power analysis?
Just so this question has an answer people can look at in the future, I had this question answered on Cross Validated instead for those interested. The example code given by the person who answered is one way of achieving this:
require(lme4)
# Design
nparticipants <- 200
ntrials <- 1000
# simulate data
set.seed(19483093)
X <- data.frame(
vowel_count = runif(ntrials, 1, 1000),
person = sample(1:nparticipants, size = ntrials, replace = TRUE)
)
X$IQ <- runif(nparticipants, 80, 150)[X$person]
X$visual_motor_skills <- runif(nparticipants, 0, 100)[X$person]
X$person_error <- rnorm(nparticipants, 0, 0.5)[X$person]
# True model
X$accuracy <- with(X, plogis(((IQ-100)/100 - (visual_motor_skills-50)/100) * (vowel_count - 500)/1000 + person_error + rnorm(ntrials, 0, 0.5)))
# Model fitting with centered and scaled covariates
X$model_IQ <- (X$IQ - mean(X$IQ)) / sd(X$IQ)
X$model_vms <- (X$visual_motor_skills - mean(X$visual_motor_skills)) / sd(X$visual_motor_skills)
X$model_vowel_count <- (X$vowel_count - mean(X$vowel_count)) / sd(X$vowel_count)
# random intercept for each person
# This model ignores the fact that IQ and visual motor skills are only measured once for each person
lmer1 <- lme4::lmer(accuracy ~ model_IQ*model_vowel_count +
model_vms*model_vowel_count +
(1 | person), data = X)
summary(lmer1)
plot(lmer1)
# random intercept for each person and random slopes for IQ and visual motor skills
# Model without the logistic transform has singularity issues
lmer2 <- lme4::lmer(qlogis(accuracy) ~ model_IQ*model_vowel_count +
model_vms*model_vowel_count +
(1 + model_IQ + model_vms | person), data = X)
summary(lmer2)
plot(lmer2)