Search code examples
rlme4longitudinal

Linear Mixed Effect Model in R will not converge


I have a long format dataset for data on about 70K individuals. The data is tracking a continuous follow-up measure at 3, 6, and 12 months past baseline (a +/- 1 month buffer was given for each time point). Members only had to have 1 follow up measure at any time point to be included in the study. There is complete data at baseline and 94% complete data at month 3, however, it has quite a lot of missingness at the later time points (60% at month 6, 87% at month 12).

ID time_point continuous outcome
1 0 7.5
1 3 7.2
1 6 NA
1 12 7.0

I am using the lmer command in R's lme4 package to attempt to run this model with time as a factor variable in the fixed effects combined with a random slope (time as a continuous variable) and intercept in the random effects, treating each member as an individual cluster. Code:

m.unstructured <- lmer(outcome ~ time_factor + (1 + time | id),
  data = df.long
)

I am expecting to get a model summary containing values for the changes from the baseline at each time point. However, the model fails to converge with the following error:

Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00317116 (tol = 0.002, component 1)

I can only get the model to run when I add an optimizer like Nelder-Mead or bobyqa, eg, control = lmerControl(optimizer ="Nelder_Mead"). I experienced similar issues when attempting to run this model in the nlme package with the lme command.

My questions are, what is that argument doing and why won't my model run without it? Is it due to the large sample size and extensive missingness? Can I assume this approach using an optimizer produces valid results? Eventually I'd like to compare covariance structures and add in additional fixed effects such as sex and age, but until I can understand how to run my model, I am stuck. Any guidance you could provide is appreciated!


Solution

  • There is a lot of information about this in the lme4 documentation and auxiliary info:

    More specifically, this page illustrates that the convergence-checking machinery starts to get unreliable around 10,000 observations (you have around 120K observations: an 'observation' is a row in the data frame, i.e. a subject:time_point combination).

    What you are calling an error is not technically an error, it's a warning; this is an important distinction (see here for more info. If you really had an error, you wouldn't be able to retrieve a result; since it's a warning, you can.

    My suggestion:

    • use allFit() to try your model with all of the available optimizers
    • compare the results (for whichever aspects of the model are important to you, e.g. the fixed effects of time_factor or the random effects variances, and to whatever tolerance is important to your application)
      • if all the optimizers give very different answers, you may be in trouble
      • if a subset of the optimizers give similar, sensible answers, pick your favourite from this subset (maybe the fastest one) and proceed with the rest of your analyses. After this, you might decide to use lmerControl(calc.derivs = FALSE) to suppress the (error-prone) derivative calculation, which will save you from further warnings (!) and save time ...