Search code examples
rmixed-modelsglmmtmb

Why am I getting NAs in the model summary output? zero-inflated GLMM with glmmTMB()


I am trying to run a zero-inflated negative binomial GLMM with glmmTMB; however I am getting NAs in the z and p values of my model summary output. I am not sure what the cause is; I have followed the vignette and online help, but I think there must be an issue with my data and the technique I am trying to use. My data is similar to the Salamanders example used in the supporting documentation: a negative binomial distribution, zero inflated, with the same data structure.

Where is the issue? Is this data suitable for using family = nbinom2?

data:

> head(abun_data)
    depl_ID          Keyword_1 depl_dur     logging n AmbientTemperature ElNino
1 B1-1-14_1        Bearded Pig       82 pre-logging 3           23.33333 before
2 B1-1-14_1  Malayan Porcupine       82 pre-logging 0           24.33333 before
3 B1-1-14_1 Pig-tailed Macaque       82 pre-logging 3           24.33333 before
4 B1-1-14_1        Sambar Deer       82 pre-logging 0           24.00000 before
5 B1-1-14_1        Red Muntjac       82 pre-logging 2           24.00000 before
6 B1-1-14_1  Lesser Mouse-deer       82 pre-logging 1           23.00000 before

> str(abun_data)
'data.frame':   1860 obs. of  7 variables:
 $ depl_ID           : Factor w/ 315 levels "B1-1-14_1","B1-1-14_2",..: 1 1 1 1 1 1 2 2 2 2 ...
 $ Keyword_1         : Factor w/ 6 levels "Bearded Pig",..: 1 2 3 4 5 6 1 2 3 4 ...
 $ depl_dur          : num  82 82 82 82 82 82 26 26 26 26 ...
 $ logging           : Factor w/ 3 levels "logging","post-logging",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ n                 : int  3 0 3 0 2 1 2 0 0 0 ...
 $ AmbientTemperature: num  23.3 24.3 24.3 24 24 ...
 $ ElNino            : Factor w/ 3 levels "after","before",..: 2 2 2 2 2 2 2 2 2 2 ...

My model:

> zinb <- glmmTMB(n ~ Keyword_1 * logging + (1|depl_ID), zi = ~ Keyword_1 * logging,
+                 data = abun_data, family = "nbinom2")
Warning message:
In fitTMB(TMBStruc) :
  Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
> summary(zinb)
 Family: nbinom2  ( log )
Formula:          n ~ Keyword_1 * logging + (1 | depl_ID)
Zero inflation:     ~Keyword_1 * logging
Data: abun_data

     AIC      BIC   logLik deviance df.resid 
      NA       NA       NA       NA     1822 

Random effects:

Conditional model:
 Groups  Name        Variance Std.Dev.
 depl_ID (Intercept) 0.5413   0.7358  
Number of obs: 1860, groups:  depl_ID, 310

Overdispersion parameter for nbinom2 family (): 1.29 

Conditional model:
                                                 Estimate Std. Error z value Pr(>|z|)
(Intercept)                                       0.99965         NA      NA       NA
Keyword_1Malayan Porcupine                       -1.30985         NA      NA       NA
Keyword_1Pig-tailed Macaque                      -0.90110         NA      NA       NA
Keyword_1Sambar Deer                             -1.34268         NA      NA       NA
Keyword_1Red Muntjac                             -0.76250         NA      NA       NA
Keyword_1Lesser Mouse-deer                      -16.21798         NA      NA       NA
loggingpost-logging                               0.83935         NA      NA       NA
loggingpre-logging                                0.58252         NA      NA       NA
Keyword_1Malayan Porcupine:loggingpost-logging   -0.53276         NA      NA       NA
Keyword_1Pig-tailed Macaque:loggingpost-logging  -5.52093         NA      NA       NA
Keyword_1Sambar Deer:loggingpost-logging         -0.73450         NA      NA       NA
Keyword_1Red Muntjac:loggingpost-logging          0.04825         NA      NA       NA
Keyword_1Lesser Mouse-deer:loggingpost-logging   -9.74912         NA      NA       NA
Keyword_1Malayan Porcupine:loggingpre-logging    -0.18893         NA      NA       NA
Keyword_1Pig-tailed Macaque:loggingpre-logging   -0.08802         NA      NA       NA
Keyword_1Sambar Deer:loggingpre-logging           0.72087         NA      NA       NA
Keyword_1Red Muntjac:loggingpre-logging           0.51223         NA      NA       NA
Keyword_1Lesser Mouse-deer:loggingpre-logging    15.10588         NA      NA       NA

Zero-inflation model:
                                                Estimate Std. Error z value Pr(>|z|)
(Intercept)                                      -1.3469         NA      NA       NA
Keyword_1Malayan Porcupine                      -11.7164         NA      NA       NA
Keyword_1Pig-tailed Macaque                       1.5618         NA      NA       NA
Keyword_1Sambar Deer                              0.6967         NA      NA       NA
Keyword_1Red Muntjac                            -17.6199         NA      NA       NA
Keyword_1Lesser Mouse-deer                       18.7331         NA      NA       NA
loggingpost-logging                             -19.2344         NA      NA       NA
loggingpre-logging                               -2.1708         NA      NA       NA
Keyword_1Malayan Porcupine:loggingpost-logging   32.6525         NA      NA       NA
Keyword_1Pig-tailed Macaque:loggingpost-logging  -1.2560         NA      NA       NA
Keyword_1Sambar Deer:loggingpost-logging         19.1848         NA      NA       NA
Keyword_1Red Muntjac:loggingpost-logging         -3.4218         NA      NA       NA
Keyword_1Lesser Mouse-deer:loggingpost-logging    7.4168         NA      NA       NA
Keyword_1Malayan Porcupine:loggingpre-logging    14.3338         NA      NA       NA
Keyword_1Pig-tailed Macaque:loggingpre-logging  -22.1736         NA      NA       NA
Keyword_1Sambar Deer:loggingpre-logging           1.6785         NA      NA       NA
Keyword_1Red Muntjac:loggingpre-logging          17.0664         NA      NA       NA
Keyword_1Lesser Mouse-deer:loggingpre-logging   -14.3445         NA      NA       NA

Solution

  • The first clue is the warning

    Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')

    This means the model hasn't converged, or doesn't think it has, to a solution where the log-likelihood surface is downward-curved (i.e., a true maximum). That's why the standard errors can't be calculated (if you did the usual calculation they'd come out negative or complex). The log-likelihood could be calculated, but the model fit is suspect so glmmTMB returns NA instead.

    Next question: why? Sometimes this is mysterious and hard to diagnose, but in this case we have a good clue: when you see extreme parameter values (e.g. |beta|>10) in a (non-identity link) GLM, it almost always means that some form of complete separation is occurring. That is, there are some combinations of covariates (e.g. Keyword_1==Lesser Mouse-deer) where you always have zero counts. On the log scale, this means the density is infinitely lower than covariate combinations where you have a positive mean. The parameter is about -16, which corresponds to an expected multiplicative density difference of exp(-16) = 1e-07. This isn't infinitesimal, but it's small enough that glmmTMB gets small enough differences in the log-likelihood that the optimizer stops. However, since the likelihood surface is almost flat, it can't compute curvature etc..

    You could lump together or drop categories or do some form of regularization (e.g. see here or here ...); it might also make some sense to treat your Keyword_1 variable as a random effect, which would also have the effect of regularizing the estimates.