I am working with online behavioral data where each user has multiple Bernoulli trials. I am familiar with fitting hierarchical models using lme4
in R, but now that my dataset has ~1MM unique users and 1-10 observations each, the lme4
model is running endlessly on my Macbook Pro. I had previously only ever fit such models to a few thousand users and run time was manageable.
library(lme4)
glmer(outcome ~ treatment + (1|user_id), family = 'binomial', data = mydata)
How might I practically approach fitting a hierarchical model to such a large dataset?
There are a few ways to speed up a glmer
:
nAGQ = 0
within the glmer
call"nloptwrap"
as your optimizer in glmerControl
calc.derivs = F
in glmerControl
# code example
glmer(
outcome ~ condition + (1|user_id),
family = "binomial",
data = mydata,
nAGQ = 0,
control = glmerControl(optimizer = "nloptwrap", calc.derivs = FALSE)
)