I am trying to run a zero-inflated negative binomial GLMM with glmmTMB
; however I am getting NA
s in the z
and p
values of my model summary output. I am not sure what the cause is; I have followed the vignette and online help, but I think there must be an issue with my data and the technique I am trying to use.
My data is similar to the Salamanders
example used in the supporting documentation: a negative binomial distribution, zero inflated, with the same data structure.
Where is the issue? Is this data suitable for using family = nbinom2
?
data:
> head(abun_data)
depl_ID Keyword_1 depl_dur logging n AmbientTemperature ElNino
1 B1-1-14_1 Bearded Pig 82 pre-logging 3 23.33333 before
2 B1-1-14_1 Malayan Porcupine 82 pre-logging 0 24.33333 before
3 B1-1-14_1 Pig-tailed Macaque 82 pre-logging 3 24.33333 before
4 B1-1-14_1 Sambar Deer 82 pre-logging 0 24.00000 before
5 B1-1-14_1 Red Muntjac 82 pre-logging 2 24.00000 before
6 B1-1-14_1 Lesser Mouse-deer 82 pre-logging 1 23.00000 before
> str(abun_data)
'data.frame': 1860 obs. of 7 variables:
$ depl_ID : Factor w/ 315 levels "B1-1-14_1","B1-1-14_2",..: 1 1 1 1 1 1 2 2 2 2 ...
$ Keyword_1 : Factor w/ 6 levels "Bearded Pig",..: 1 2 3 4 5 6 1 2 3 4 ...
$ depl_dur : num 82 82 82 82 82 82 26 26 26 26 ...
$ logging : Factor w/ 3 levels "logging","post-logging",..: 3 3 3 3 3 3 3 3 3 3 ...
$ n : int 3 0 3 0 2 1 2 0 0 0 ...
$ AmbientTemperature: num 23.3 24.3 24.3 24 24 ...
$ ElNino : Factor w/ 3 levels "after","before",..: 2 2 2 2 2 2 2 2 2 2 ...
My model:
> zinb <- glmmTMB(n ~ Keyword_1 * logging + (1|depl_ID), zi = ~ Keyword_1 * logging,
+ data = abun_data, family = "nbinom2")
Warning message:
In fitTMB(TMBStruc) :
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
> summary(zinb)
Family: nbinom2 ( log )
Formula: n ~ Keyword_1 * logging + (1 | depl_ID)
Zero inflation: ~Keyword_1 * logging
Data: abun_data
AIC BIC logLik deviance df.resid
NA NA NA NA 1822
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
depl_ID (Intercept) 0.5413 0.7358
Number of obs: 1860, groups: depl_ID, 310
Overdispersion parameter for nbinom2 family (): 1.29
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.99965 NA NA NA
Keyword_1Malayan Porcupine -1.30985 NA NA NA
Keyword_1Pig-tailed Macaque -0.90110 NA NA NA
Keyword_1Sambar Deer -1.34268 NA NA NA
Keyword_1Red Muntjac -0.76250 NA NA NA
Keyword_1Lesser Mouse-deer -16.21798 NA NA NA
loggingpost-logging 0.83935 NA NA NA
loggingpre-logging 0.58252 NA NA NA
Keyword_1Malayan Porcupine:loggingpost-logging -0.53276 NA NA NA
Keyword_1Pig-tailed Macaque:loggingpost-logging -5.52093 NA NA NA
Keyword_1Sambar Deer:loggingpost-logging -0.73450 NA NA NA
Keyword_1Red Muntjac:loggingpost-logging 0.04825 NA NA NA
Keyword_1Lesser Mouse-deer:loggingpost-logging -9.74912 NA NA NA
Keyword_1Malayan Porcupine:loggingpre-logging -0.18893 NA NA NA
Keyword_1Pig-tailed Macaque:loggingpre-logging -0.08802 NA NA NA
Keyword_1Sambar Deer:loggingpre-logging 0.72087 NA NA NA
Keyword_1Red Muntjac:loggingpre-logging 0.51223 NA NA NA
Keyword_1Lesser Mouse-deer:loggingpre-logging 15.10588 NA NA NA
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.3469 NA NA NA
Keyword_1Malayan Porcupine -11.7164 NA NA NA
Keyword_1Pig-tailed Macaque 1.5618 NA NA NA
Keyword_1Sambar Deer 0.6967 NA NA NA
Keyword_1Red Muntjac -17.6199 NA NA NA
Keyword_1Lesser Mouse-deer 18.7331 NA NA NA
loggingpost-logging -19.2344 NA NA NA
loggingpre-logging -2.1708 NA NA NA
Keyword_1Malayan Porcupine:loggingpost-logging 32.6525 NA NA NA
Keyword_1Pig-tailed Macaque:loggingpost-logging -1.2560 NA NA NA
Keyword_1Sambar Deer:loggingpost-logging 19.1848 NA NA NA
Keyword_1Red Muntjac:loggingpost-logging -3.4218 NA NA NA
Keyword_1Lesser Mouse-deer:loggingpost-logging 7.4168 NA NA NA
Keyword_1Malayan Porcupine:loggingpre-logging 14.3338 NA NA NA
Keyword_1Pig-tailed Macaque:loggingpre-logging -22.1736 NA NA NA
Keyword_1Sambar Deer:loggingpre-logging 1.6785 NA NA NA
Keyword_1Red Muntjac:loggingpre-logging 17.0664 NA NA NA
Keyword_1Lesser Mouse-deer:loggingpre-logging -14.3445 NA NA NA
The first clue is the warning
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
This means the model hasn't converged, or doesn't think it has, to a solution where the log-likelihood surface is downward-curved (i.e., a true maximum). That's why the standard errors can't be calculated (if you did the usual calculation they'd come out negative or complex). The log-likelihood could be calculated, but the model fit is suspect so glmmTMB returns NA
instead.
Next question: why? Sometimes this is mysterious and hard to diagnose, but in this case we have a good clue: when you see extreme parameter values (e.g. |beta|>10) in a (non-identity link) GLM, it almost always means that some form of complete separation is occurring. That is, there are some combinations of covariates (e.g. Keyword_1
==Lesser Mouse-deer
) where you always have zero counts. On the log scale, this means the density is infinitely lower than covariate combinations where you have a positive mean. The parameter is about -16, which corresponds to an expected multiplicative density difference of exp(-16) = 1e-07
. This isn't infinitesimal, but it's small enough that glmmTMB gets small enough differences in the log-likelihood that the optimizer stops. However, since the likelihood surface is almost flat, it can't compute curvature etc..
You could lump together or drop categories or do some form of regularization (e.g. see here or here ...); it might also make some sense to treat your Keyword_1
variable as a random effect, which would also have the effect of regularizing the estimates.