I'm running univariate analyses to search for biological factors related to my disease outcome. I am dealing with related individuals, meaning I am using glmer
adjusting for covariates and family membership.
I would like to validate the findings from my discovery cohort (N=1000, % cases = 13%, N of groups = 750) in a replication cohort (N=350, % cases = 13%, N of groups = 250).
After doing this initially without any problems, I realised I had not accounted for use of a specific medication. Removal of those using this drug led to the remoal of 25 participants in my discovery and 6 in my replication.
I can re-run the analysis fine for the discovery dataset (results are pretty much unchanged), however, for the replication dataset I now get singular fits on almost every run.
I did not expect the removal of only 6 participants (50% of which were cases) to be so problematic. I would like to understand how this can be.
What I've looked at so far:
summary()
of the models with and without the 6 participants removed due to medication use. The variance is indeed estiamted to be 0 without the 6 medication users, although I suspect something is wrong considering variance estimates are all sufficiently high in the models with the additional 6 participants (which ran without any warnings)Here's the code with which I am running the models:
glmer(outcome ~ sex + age + bmi + independent_var + (1|family), data = data, family = binomial, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)), nAGQ = 40)
I would like to know
why this is happening, despite such a small change in the data sample (?)
Like many other model outcomes/diagnostics, the assessment of singularity is based on whether the random effects variance is close enough to zero. It's not hard for a small change in the data to push results across that threshold. For example, from this example, here is a histogram of estimated random-effects standard deviations from 1000 data sets simulated with the same parameters (5 groups, 5 observations per group, var(group) = var(resid) = 1, y ~ 1 + (1|f)
). Looking at the values near but not at zero, you can imagine that a slight change might push them over the edge to zero.
More specifically, you could look at the random effects variances from your model with and without the deleted cases (VarCorr()
, or look at the values listed in the summary); the default tolerance for reporting a singular fit is SD < 1e-4 (or variance < 1e-8) [this applies to your case with (1|family)
and a binomial model; things are a little more complicated for models with residual variance parameters and/or more complex random effect structures]. I would guess that your first model has a variance that's not too much bigger than 1e-8.
There are other slightly more complicated possibilities, but that would be my best guess. (Editing your question to show the summary()
results for the models with and without deletion would allow a more educated guess ...)
if there's anything I can do to fix the problem?
This is a very frequently asked question ... you could look at the help page for ?lme4::isSingular
, or the relevant section of the GLMM FAQ. Perhaps the simplest options are (1) ignore the singularity or (2) use blme::bglmer()
to force the model to be non-singular.