Search code examples
rnlmegammgcv

Random effect predictions from gamm model error: cannot evaluate groups for desired levels on 'newdata'


I am trying to generate predictions from a gamm model (from the mgcv package) using the newdata argument. I would like to make the predictions on the lme part of the model, so that the predictions include the random effects. I am, however, running in to problems, I think, due to how the model coefficients are named.

My question is, how should the newdata argument be structured / named to allow predictions. Thanks.

A mwe

mod <- gamm(outcome ~ s(time) + predvar, data=d, 
                        random=list(groupvar=~1), 
                        correlation = corARMA(form=~1|groupvar, p = 1))     
# okay
pred <- predict(mod$lme)

# Not okay
pred <- predict(mod$lme, newdata=d)

which produces error

Error in predict.lme(mod$lme, newdata = d) : cannot evaluate groups for desired levels on 'newdata'


If I ran the model in nlme without the spline terms, the newdata executes without problems

mod2 <- lme(outcome ~ time + predvar, data=d, 
                        random=list(groupvar=~1), 
                        correlation = corARMA(form=~1|groupvar, p = 1))     
# okay
pred2 <- predict(mod2, newdata=d)

d <- structure(list(time = c(0, 1, 2, 3, 4, 5, 6, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14), outcome = c(-1.85, -1.57, -1.38, -1.22, -1.27, -1.63, 
-2.07, -1.36, -0.33, 0.08, 0.3, 0.44, 0.78, 1.03, 1.13, 1.14, 
1.05, 0.94, 0.73, 0.51, 0.08, 0.01, 0.42, 0.59, 0.71, 0.79, 0.87, 
0.75, 0.6, 0.38, 0.01, -0.63), predvar = c(-1.83, -1.77, -1.7, 
-1.84, -1.84, -1.72, -1.69, 0.01, -0.07, 0.16, -0.04, 0.04, 0.25, 
0.19, 0.17, 0.22, 0.34, 0.54, 0.7, 0.81, 0.92, 1.12, 0.58, 0.63, 
0.63, 0.68, 0.62, 0.56, 0.61, 0.73, 0.92, 1.07), groupvar = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", 
"b", "c"), class = "factor")), .Names = c("time", "outcome", 
"predvar", "groupvar"), row.names = c(NA, -32L), class = "data.frame")

info: I have not specified the random effects as splines (s(. , bs="re")) as my RE are more complicated than in the above example.


Solution

  • One way make predictions on new data, if you need the random effects is to do the predictions on the gam part of the model, and add in the random effects.

    Using the example above,

    library(mgcv)
    
    mod <- gamm(outcome ~ s(time) + predvar, data=d, 
                            random=list(groupvar=~1), 
                            correlation = corARMA(form=~1|groupvar, p = 1))     
    # For comparison: predict with RE: we cant use the newdata arg here
    pred <- predict(mod$lme)
    
    # Extract the random effects from the model and match with the relevant observation
    re <- coef(mod$lme)[ncol(coef(mod$lme))]
    pred_ref <- re[[1]][match(d$groupvar,   gsub(".*/", "", rownames(re)) )]
    
    # Predict on gam part of model and adjust for RE
    pred2 <- as.vector(predict(mod$gam, data=d) - pred_ref)
    
    # Compare
    all.equal(pred, pred2, check.attributes = F, use.names = F)