Search code examples
rcountregressionpoissonglmmtmb

Adding two-way fixed effects for zero inflated Poisson model


I am working with a zero inflated Poisson model. I am using the glmmTMB

My outcome variable is the number of burglaries in all of Texas between 2000 and 2015. My unit of analysis are counties (county). My main variable of interest is male unemployment rate

Here is my baseline model

z1 <- glmmTMB(burglaries ~ male.unemployment + growth + gun.ownership + temperature 
+ offset(log(county.pop)), data = texas.data, 
family = poisson, ziformula=~1)

This works fine. I want to add year (year) and county (county) fixed effects to this model. Below is my best. I am pretty sure that my notation is off and that I am just adding random effects (the summary output says so). What would the correct notation be?

z1 <- glmmTMB(burglaries ~ male.unemployment + growth + gun.ownership + temperature 
+ offset(log(county.pop)) + (1|id+year), data = texas.data,
 family = poisson, ziformula=~1)

Thank you!


Solution

  • If you really want to add id and year as fixed effects you can do it as @YouLocalUser suggested:

    z1 <- glmmTMB(burglaries ~ male.unemployment + growth + gun.ownership +
       temperature + offset(log(county.pop)) + factor(id) + factor(year), 
       data = texas.data, family = poisson, ziformula=~1) 
    

    (I use factor() instead of as.factor() for brevity). However, it would be wise to specify sparseX = c(cond=TRUE) to allow a sparse model matrix for the conditional model*. Alternatively, slightly more efficiently, you could specify random effects but use the start and map arguments to fix their variances at a very large value, e.g.

    z1 <- glmmTMB(burglaries ~ male.unemployment + growth + gun.ownership +
       temperature + offset(log(county.pop)) + (1|id) + (1|year), 
       data = texas.data, family = poisson, ziformula=~1,
       start = list(theta = c(100, 100)),
       map = list(theta = factor(c(NA, NA)))
    

    This will set the random-effects variances at exp(100) (2.7e43), which should be approximately infinite (== fixed effects) for practical purposes ...


    * Since there are 240 counties in Texas and 16 years in the data set, and only four continuous covariates, the fixed-effects model matrix will be sparse (97.3% zeros, in this case):

    dd <- expand.grid(f = factor(1:240), year = factor(1:16))
    vars <- paste0("v", 1:4)
    others <- replicate(4, rnorm(nrow(dd))) |> as.data.frame() |> setNames(vars)
    dd2 <- cbind(dd, others)
    form <- reformulate(c(vars, names(dd)))
    Xd <- model.matrix(form, dd2)
    Xs <- Matrix::sparse.model.matrix(form, dd2)
    mean(Xd==0)  ## 0.973
    format(object.size(Xd), "Mb")  ## 7.8 Mb
    format(object.size(Xs), "Mb")  ## 0.6 Mb