I've read several posts about having convergence issues with glmer and I have tried a couple recommended work arounds (changing optimizer, changing model iterations, etc.) but nothing seems to resolve my convergence issue. I was wondering if someone could help me figure out what I am doing wrong?
Data description: The data is count data of Monarch butterfly stages throughout several years at different sites within the year. The data has over 6000 observations, but the number of observations and sites vary across years. The data has several zeros (no Monarch observed) and the data is over dispersed.
Count: Response variable of how many individuals were counted
Year: Factor with 6 levels
Stage: Factor with 7 levels
Site: Random categorical variable. Listed as a random variable because each site is a replicate within the year
Data summary:
X siteID site year date stage
Min. : 977 Min. :2538 Groesbeck Field : 581 2015: 817 7/22/2016: 91 adult :976
1st Qu.:2696 1st Qu.:2541 Rain Garden : 490 2016:1258 7/1/2016 : 77 no_eggs :927
Median :4428 Median :2546 Preschool Butterfly Garden: 435 2017: 937 7/29/2016: 77 no_fifth :927
Mean :4419 Mean :2971 Spring Pond : 427 2018:1710 8/12/2016: 77 no_first :927
3rd Qu.:6144 3rd Qu.:2892 Whitetail field : 406 2019:1116 8/25/2016: 77 no_fourth:927
Max. :7808 Max. :6411 Pollinator Mound : 387 2020: 700 (Other) :6090 no_second:927
(Other) :3812 NA's : 49 no_third :927
count
Min. : 0.0000
1st Qu.: 0.0000
Median : 0.0000
Mean : 0.6422
3rd Qu.: 0.0000
Max. :66.0000
head(monarch_long, n=20)
X siteID site year date stage count
1 1505 2541 Groesbeck Field 2018 8/21/2018 no_eggs 66
2 1748 2541 Groesbeck Field 2019 8/12/2019 no_eggs 53
3 1365 2543 Rain Garden 2017 8/7/2017 no_eggs 51
4 1591 2543 Rain Garden 2018 8/21/2018 no_eggs 49
5 1504 2541 Groesbeck Field 2018 8/15/2018 no_eggs 47
6 1469 2546 Butterfly Garden 2018 8/9/2018 no_eggs 46
7 981 2561 Barred Owl Trail 2015 8/5/2015 no_eggs 45
8 2447 2546 Butterfly Garden 2018 8/24/2018 no_first 44
9 3424 2546 Butterfly Garden 2018 8/31/2018 no_second 40
10 1503 2541 Groesbeck Field 2018 8/7/2018 no_eggs 38
11 3423 2546 Butterfly Garden 2018 8/24/2018 no_second 38
12 1021 2541 Groesbeck Field 2015 8/20/2015 no_eggs 37
13 4400 2546 Butterfly Garden 2018 8/31/2018 no_third 33
14 1265 2538 Wetland Restoration 2016 8/25/2016 no_eggs 32
15 1749 2541 Groesbeck Field 2019 8/23/2019 no_eggs 32
16 1470 2546 Butterfly Garden 2018 8/17/2018 no_eggs 31
17 1471 2546 Butterfly Garden 2018 8/24/2018 no_eggs 30
18 1034 2559 Parking Lot Islands 2015 8/11/2015 no_eggs 28
19 1588 2543 Rain Garden 2018 8/2/2018 no_eggs 27
20 6275 2892 Vernal Pool 2017 9/20/2017 no_fifth 27```
Code:
glmer.nb(count~year+stage+(1|site),
control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=2e5)),
data=monarch_long)
Warning message: Warning message: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| = 0.0123119 (tol = 0.002, component 1)
Please let me know if I need to provide more details. Any advice would be greatly appreciated!
tl;dr I think your fit is actually fine. For negative binomial GLMMs I have now taken to recommending glmmTMB
rather than lme4::glmer.nb
. In any case, it works for checking the model parameters with a completely different implementation/algorithm for the model and making sure the answers are the same, which is the gold standard for addressing convergence warnings ...
library(lme4)
system.time(m1 <- glmer.nb(count~year+stage+(1|site),
control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=2e5)),
data=monarch_long))
Refit with glmmTMB
(using family="nbinom2"
)
library(glmmTMB)
system.time(m2 <- glmmTMB(count~year+stage+(1|site),
family="nbinom2",
data=monarch_long))
## if you have multiple cores you can speed things up further ...
system.time(m3 <- glmmTMB(count~year+stage+(1|site),
family="nbinom2",
control=glmmTMBControl(parallel=5),
data=monarch_long)) ## 10 seconds
Comparing the log-likelihood (with logLik()
) shows that glmmTMB
actually does slightly (0.2 log-likelihood units) better, but in any case the coefficient estimates are very similar (graphical comparison of fixed effects only, but the variance estimates also differ by <1%):
library(ggplot2)
library(dotwhisker)
library(broom.mixed)
dwplot(list(glmer.nb=m1,glmmTMB=m2),effect="fixed") + theme_bw() +
geom_vline(xintercept=0,lty=2)
monarch_long <- read.csv("MLMP_2020\ long\ data\ -\ Sheet1.csv")
monarch_long$year <- factor(monarch_long$year)
monarch_long$stage <- factor(monarch_long$stage,
levels=c(paste0("no_",
c("eggs","first","second","third","fourth","fifth")),"adult"))