Search code examples
rregressionpredictgammgcv

mgcv: How to use 'exclude' argument in predict.gam?


I have a model structured as follows, and I would like to extract the predicted values while ignoring the random effect. As specified in ?predict.gam and here, I am using the exclude argument, but I am getting an error. Where is my mistake?

dt <- data.frame(n1 = runif(500, min=0, max=1),
             n2 = rep(1:10,50), 
             n3 = runif(500, min=0, max=2),
             n4 = runif(500, min=0, max=2),
             c1 = factor(rep(c("X","Y"),250)),
             c2 = factor(rep(c("a", "b", "c", "d", "e"), 100)))

mod = gam(n1 ~ 
           s(n2, n3, n4, by=c1) +
           s(c2, bs="re"),
         data=dt)

newd=data.table(expand.grid(n1=seq(min(dt$n1), max(dt$n1), 0.5), 
                        n2=1:10,
                        n3=seq(min(dt$n3), max(dt$n3), 0.5),
                        n4=seq(min(dt$n4), max(dt$n4), 0.5),
                        c1=c("X", "Y")))
newd$pred <- predict.gam(mod, newd, exclude = "s(c2)")

In predict.gam(mod, newd, exclude = "s(c2)"): not all required variables have been supplied in  newdata! 

Solution

  • exclude does not work in the way as you assumed. You still need to provide all variables in your newd for predict.gam. See my this answer for what is behind predict.gam.

    Here is what you need to do:

    ## pad newd with an arbitrary value for variable c2
    newd$c2 <- "a"
    ## termwise prediction
    pt <- predict.gam(mod, newd, type = "terms", exclude = "s(c2)")
    ## linear predictor without random effect
    lp_no_c2 <- rowSums(pt) + attr(pt, "constant")