Search code examples
rpredictgam

zero-inflated GAM prediction with newdata


In a zero-inflated GAM (ziplss), I'm getting a warning when 1) I use new data and 2) the count portion has categorical variables that are NOT in the zero-inflation portion. There's a warning for every categorical variable not represented in the zero-inflation part.

Here's a reproducible example:

library(mgcv)
library(glmmTMB)
data(Salamanders)   
Salamanders$x <- rnorm(nrow(Salamanders), 0, 10)

zipgam <- gam(list(count ~ spp * mined + s(x) + s(site, bs = "re"),
                ~ spp),
           data = Salamanders, family = ziplss, method = "REML")

preds.response <- data.frame(Predict = predict(zipgam, type = "response"))

nd <- data.frame(x = 0, spp = "GP", mined = "yes", site = Salamanders$site[1])      
nd$pred <- predict(zipgam, newdata = nd, exclude="site")

I haven't seen this mentioned anywhere, which is odd and tells me that I'm likely doing something wrong (otherwise this would be available in search results). Would appreciate any insight.


Solution

  • I think this is just an infelicity in the implementation. The warning I am seeing is:

    Warning message:
    In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
      variable 'mined' is absent, its contrast will be ignored
    

    This is harmless (at least in this case; I haven't checked other cases) and is generated because there is only a single object$contrasts, and it contains details about mined but this variable is not present in the second linear predictor so R warns that it is going to ignore the contrasts for the mined variable, but this only happens when building the model matrix for the zero-inflation part of the model. The count part correctly uses the mined variable and the correct contrasts.

    You could argue that having $contrasts be a list, one per linear predictor would be a better design and then the model matrix would be created using:

    model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts[[i]])
    

    but I have no idea if this would break everything else in mgcv.

    Currently $contrasts for this model is just:

    > zipgam$contrasts
    $spp
    [1] "contr.treatment"
    
    $mined
    [1] "contr.treatment"
    
    $spp
    [1] "contr.treatment"
    

    which already shows some redundancy.