I am trying to fit my data to a zero-inflated negative binomial model but one of my 3 independent variables (Exposure) seems to be causing NaNs to be produced (see very end of zeroinfl call) when the SE is being calculated in the summary function. I have also tried running a negative binomial hurdle model and am running into a similar issue.
str(eggTreat)
'data.frame': 455 obs. of 4 variables:
$ Exposure : Factor w/ 2 levels "C","E": 2 2 2 2 2 2 2 2 2 2 ...
$ hi_lo : Factor w/ 2 levels "hi","lo": 2 2 2 2 2 2 2 2 2 2 ...
$ Egg_count: int 0 0 0 0 0 0 0 0 0 0 ...
$ Food : Factor w/ 2 levels "1.5A5YS","5ASMQ": 2 2 2 2 2 2 2 2 2 2 ...
mod.zeroinfl <- zeroinfl(Egg_count ~ Food+Exposure+hi_lo | Food+Exposure+hi_lo, data=eggTreat,
+ dist="negbin")
> summary(mod.zeroinfl)
Call:
zeroinfl(formula = Egg_count ~ Food + Exposure + hi_lo | Food + Exposure + hi_lo, data = eggTreat, dist = "negbin")
Pearson residuals:
Min 1Q Median 3Q Max
-0.65632 -0.47163 -0.28588 0.02976 9.00804
Count model coefficients (negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.04435 0.14393 -0.308 0.7580
Food -1.12486 0.22267 -5.052 4.38e-07 ***
Exposure -2.34990 0.38684 -6.075 1.24e-09 ***
hi_lo -0.44893 0.19524 -2.299 0.0215 *
Log(theta) -0.24387 0.22639 -1.077 0.2814
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.830e+01 NA NA NA
Food -5.768e+00 5.628e+04 0 1
Exposure 4.612e-01 NA NA NA
hi_lo -7.477e+00 9.963e+05 0 1
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta = 0.7836
Number of iterations in BFGS optimization: 21
Log-likelihood: -350.2 on 9 Df
Warning message:
In sqrt(diag(object$vcov)) : NaNs produced
function (object, ...)
{
object$residuals <- residuals(object, type = "pearson")
kc <- length(object$coefficients$count)
kz <- length(object$coefficients$zero)
se <- sqrt(diag(object$vcov))
This problem is typically caused by complete separation; using this search term, or searching for the Hauck-Donner effect, will show you that the problem is that there is some linear combination of your predictor values that perfectly separates the zeros and non-zeros (since the predictor variables in your zero-inflation are all categorical, this translates to a combination of categories where all the values are zero or non-zero).
I would take a look at with(eggTreat, table(eggcount>0, Food, Exposure, hi_lo))
(arrange the arguments in whatever order makes the table easiest to read).
Typical symptoms include:
|beta|>10
); in this case your intercept is -18.3, which gives a predicted zero-inflation probability of 1e-8
in the baseline category (two of the other values are also large, although not nearly as extreme as the intercept)Food
, hi_lo
), leading to z-values that are effectively zero and p-values of effectively 1NA
values you're seeingThere are various solutions to this problem:
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.830e+01 NA NA NA
Food -5.768e+00 5.628e+04 0 1
Exposure 4.612e-01 NA NA NA
hi_lo -7.477e+00 9.963e+05 0 1