Search code examples
rmcmcjags

How to specify zero-inflated negative binomial model in JAGS


I'm currently working on constructing a zero-inflated negative binomial model in JAGS to model yearly change in abundance using count data and am currently a bit lost on how best to specify the model. I've included an example of the base model I'm using below. The main issue I'm struggling with is that in the model output I'm getting poor convergence (high Rhat values, low Neff values) and the 95% credible intervals are huge. I realize that without seeing/running the actual data there's probably not much anyone can help with but I thought I'd at least try and see if there are any obvious errors in the way I have the basic model specified. I also tried fitting a variety of other model types (regular negative binomial, Poisson, and zero-inflated Poisson) but decided to go with the ZINB since it had the lowest DIC scores of all the models and also makes the most intuitive sense to me, given my data structure.

library(R2jags)

# Create example dataframe

years <- c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2)
sites <- c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3)
months <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)

# Count data
day1 <- floor(runif(18,0,7))
day2 <- floor(runif(18,0,7))
day3 <- floor(runif(18,0,7))
day4 <- floor(runif(18,0,7))
day5 <- floor(runif(18,0,7))

df <- as.data.frame(cbind(years, sites, months, day1, day2, day3, day4, day5))


# Put count data into array
y <- array(NA,dim=c(2,3,3,5))                

for(m in 1:2){
  for(k in 1:3){
    sel.rows <- df$years == m & 
      df$months==k
    y[m,k,,] <- as.matrix(df)[sel.rows,4:8]
  }
}

# JAGS model
sink("model1.txt")
cat("
    model {
    
    # PRIORS
    
    for(m in 1:2){
      r[m] ~ dunif(0,50)
    }         
    t.int ~ dlogis(0,1)
    b.int ~ dlogis(0,1)
    p.det ~ dunif(0,1)
    
    
    # LIKELIHOOD
    # ECOLOGICAL SUBMODEL FOR TRUE ABUNDANCE
      for (m in 1:2) {  
    
        zero[m] ~ dbern(pi[m])
    
        pi[m] <- ilogit(mu.binary[m])

        mu.binary[m] <- t.int
        
        for (k in 1:3) {                          
        
          for (i in 1:3) {                        
    
            N[m,k,i] ~ dnegbin(p[m,k,i], r)
    
            p[m,k,i] <- r[m] / (r[m] + (1 - zero[m]) * lambda.count[m,k,i]) - 1e-10 * zero[m]
    
            lambda.count[m,k,i] <- exp(mu.count[m,k,i])
    
            log(mu.count[m,k,i]) <- b.int
    
    # OBSERVATIONAL SUBMODEL FOR DETECTION
    
            for (j in 1:5) {                     
            
              y[m,k,i,j] ~ dbin(p.det, N[m,k,i])   
                                                      
            }#j
          }#i
        }#k
      }#m

  }#END", fill=TRUE)
sink()

win.data <- list(y = y)

Nst <- apply(y,c(1,2,3),max)+1

inits <- function()list(N = Nst)

params <- c("N") 

nc <- 3
nt <- 1
ni <- 50000
nb <- 5000

out <- jags(win.data, inits, params, "model1.txt", 
            n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb, 
            working.directory = getwd())

print(out)

Tried fitting a ZINB model in JAGS using the code specified above but am having issues with model convergence.


Solution

  • The way that I have tended to specify zero-inflated models is to model the data as being Poisson distributed with mean that is either zero if that individual is part of the zero-inflated group, or distributed according to a gamma distribution otherwise. Something like:

    Obs[i] ~ dpois(lambda[i] * is_zero[i])
    is_zero[i] ~ dbern(zero_prob)
    lambda[i] ~ dgamma(k, k/mean)
    

    Something similar to this was first used in this paper: https://www.researchgate.net/publication/5231190_The_distribution_of_the_pathogenic_nematode_Nematodirus_battus_in_lambs_is_zero-inflated

    These models usually converge OK, although the performance is not as good as for simpler models of course. You also need to make sure to supply initial values for is_zero so that the model starts with all individuals with positive counts in the appropriate group.

    In your case, you have multiple timepoints, so you need to decide if the zero-inflation is fixed over time points (i.e. an individual cannot switch to or from zero-inflated group over time), or if each observation is completely independent with respect to zero-inflation status. You also need to decide if you want to have co-variates of year/month/site affecting the mean count (i.e. the gamma part) or the probability of a positive count (i.e. the zero-inflation part). For the former, you need to index mean (in my formulation) by i and then use a GLM-like formula (probably using log link) to relate this to the appropriate covariates. For the latter, you need to index zero_prob by i and then use a GLM-like formula (probably using logit link) to relate this to the appropriate covariates. It is also possible to do both, but if you try to use the same covariates in both parts then you can expect convergence problems!

    It would arguably be better to replace the separate Poisson-Gamma distributions with a single Negative Binomial distribution using the 'ecology parameterisation' with mean and k. This is not currently implemented in JAGS, but I will add it for the next update.