Search code examples
rbigdatahierarchical-datalme4

How to fit hierarchical models on big data with repeated observations


I am working with online behavioral data where each user has multiple Bernoulli trials. I am familiar with fitting hierarchical models using lme4 in R, but now that my dataset has ~1MM unique users and 1-10 observations each, the lme4 model is running endlessly on my Macbook Pro. I had previously only ever fit such models to a few thousand users and run time was manageable.

library(lme4)
glmer(outcome ~ treatment + (1|user_id), family = 'binomial', data = mydata)

How might I practically approach fitting a hierarchical model to such a large dataset?


Solution

  • There are a few ways to speed up a glmer:

    • Try setting nAGQ = 0 within the glmer call
    • Try specifying "nloptwrap" as your optimizer in glmerControl
    • Try specifying calc.derivs = F in glmerControl

    More info here

    # code example
    glmer(
        outcome ~ condition + (1|user_id),
        family = "binomial", 
        data = mydata, 
        nAGQ = 0,
        control = glmerControl(optimizer = "nloptwrap", calc.derivs = FALSE)
    )