Search code examples
lme4singular

glmer singular fit after small change in the data


I'm running univariate analyses to search for biological factors related to my disease outcome. I am dealing with related individuals, meaning I am using glmer adjusting for covariates and family membership.

I would like to validate the findings from my discovery cohort (N=1000, % cases = 13%, N of groups = 750) in a replication cohort (N=350, % cases = 13%, N of groups = 250).

After doing this initially without any problems, I realised I had not accounted for use of a specific medication. Removal of those using this drug led to the remoal of 25 participants in my discovery and 6 in my replication.

I can re-run the analysis fine for the discovery dataset (results are pretty much unchanged), however, for the replication dataset I now get singular fits on almost every run.

I did not expect the removal of only 6 participants (50% of which were cases) to be so problematic. I would like to understand how this can be.

What I've looked at so far:

  • I wondered whether the number of males with my disease outcome may now be too small (since few males have the disease), however all of the participants I removed were females
  • I wondered whether removing these participants led to a loss of related individuals discordant for the outcome, thereby causing the variance of the random effects to be estimated at 0; yet, I only lose one discordant pair when I remove those using the medication
  • I compared the summary() of the models with and without the 6 participants removed due to medication use. The variance is indeed estiamted to be 0 without the 6 medication users, although I suspect something is wrong considering variance estimates are all sufficiently high in the models with the additional 6 participants (which ran without any warnings)
  • I randomly removed 6 samples (3 with my outcome and 3 without) and did not get the singularity issues, suggesting that it was not simply a question of reduced sample size (further supported by other experiments with the same dataset but fewer samples also running fine)

Here's the code with which I am running the models:

glmer(outcome ~ sex + age + bmi + independent_var + (1|family), data = data, family = binomial, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)), nAGQ = 40)

I would like to know

  1. why this is happening, despite such a small change in the data sample, and
  2. if there's anything I can do to fix the problem?

Solution

  • why this is happening, despite such a small change in the data sample (?)

    Like many other model outcomes/diagnostics, the assessment of singularity is based on whether the random effects variance is close enough to zero. It's not hard for a small change in the data to push results across that threshold. For example, from this example, here is a histogram of estimated random-effects standard deviations from 1000 data sets simulated with the same parameters (5 groups, 5 observations per group, var(group) = var(resid) = 1, y ~ 1 + (1|f)). Looking at the values near but not at zero, you can imagine that a slight change might push them over the edge to zero.

    histogram of SD from simulated runs showing a Gaussian-ish curve peaking at about 0.7 with a spike at zero

    More specifically, you could look at the random effects variances from your model with and without the deleted cases (VarCorr(), or look at the values listed in the summary); the default tolerance for reporting a singular fit is SD < 1e-4 (or variance < 1e-8) [this applies to your case with (1|family) and a binomial model; things are a little more complicated for models with residual variance parameters and/or more complex random effect structures]. I would guess that your first model has a variance that's not too much bigger than 1e-8.

    There are other slightly more complicated possibilities, but that would be my best guess. (Editing your question to show the summary() results for the models with and without deletion would allow a more educated guess ...)

    if there's anything I can do to fix the problem?

    This is a very frequently asked question ... you could look at the help page for ?lme4::isSingular, or the relevant section of the GLMM FAQ. Perhaps the simplest options are (1) ignore the singularity or (2) use blme::bglmer() to force the model to be non-singular.