I am trying to generate predictions from a gamm
model (from the mgcv
package) using the newdata
argument. I would like to make the predictions on the lme
part of the model, so that the predictions include the random effects. I am, however, running in to problems, I think, due to how the model coefficients are named.
My question is, how should the newdata
argument be structured / named to allow predictions. Thanks.
A mwe
mod <- gamm(outcome ~ s(time) + predvar, data=d,
random=list(groupvar=~1),
correlation = corARMA(form=~1|groupvar, p = 1))
# okay
pred <- predict(mod$lme)
# Not okay
pred <- predict(mod$lme, newdata=d)
which produces error
Error in predict.lme(mod$lme, newdata = d) : cannot evaluate groups for desired levels on 'newdata'
If I ran the model in nlme
without the spline terms, the newdata
executes without problems
mod2 <- lme(outcome ~ time + predvar, data=d,
random=list(groupvar=~1),
correlation = corARMA(form=~1|groupvar, p = 1))
# okay
pred2 <- predict(mod2, newdata=d)
d <- structure(list(time = c(0, 1, 2, 3, 4, 5, 6, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14), outcome = c(-1.85, -1.57, -1.38, -1.22, -1.27, -1.63,
-2.07, -1.36, -0.33, 0.08, 0.3, 0.44, 0.78, 1.03, 1.13, 1.14,
1.05, 0.94, 0.73, 0.51, 0.08, 0.01, 0.42, 0.59, 0.71, 0.79, 0.87,
0.75, 0.6, 0.38, 0.01, -0.63), predvar = c(-1.83, -1.77, -1.7,
-1.84, -1.84, -1.72, -1.69, 0.01, -0.07, 0.16, -0.04, 0.04, 0.25,
0.19, 0.17, 0.22, 0.34, 0.54, 0.7, 0.81, 0.92, 1.12, 0.58, 0.63,
0.63, 0.68, 0.62, 0.56, 0.61, 0.73, 0.92, 1.07), groupvar = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("a",
"b", "c"), class = "factor")), .Names = c("time", "outcome",
"predvar", "groupvar"), row.names = c(NA, -32L), class = "data.frame")
info: I have not specified the random effects as splines (s(. , bs="re")) as my RE are more complicated than in the above example.
One way make predictions on new data, if you need the random effects is to do the predictions on the gam
part of the model, and add in the random effects.
Using the example above,
library(mgcv)
mod <- gamm(outcome ~ s(time) + predvar, data=d,
random=list(groupvar=~1),
correlation = corARMA(form=~1|groupvar, p = 1))
# For comparison: predict with RE: we cant use the newdata arg here
pred <- predict(mod$lme)
# Extract the random effects from the model and match with the relevant observation
re <- coef(mod$lme)[ncol(coef(mod$lme))]
pred_ref <- re[[1]][match(d$groupvar, gsub(".*/", "", rownames(re)) )]
# Predict on gam part of model and adjust for RE
pred2 <- as.vector(predict(mod$gam, data=d) - pred_ref)
# Compare
all.equal(pred, pred2, check.attributes = F, use.names = F)