Search code examples
rpredictgam

Changing predicted values to response scale after predict() function, not via type = "response"


Working in R. I'm having trouble with calculating my predicted values to the response scale when I have to exclude a random effect from the prediction. By excluding the random effect from the prediction, I need to specify type = "terms", hereby making it impossible to include the type = "response" argument. Is there a way of recalculating the predicted values to the response scale (beta regression)? Or is it possible to both specify the exclusion of Area and type = "response" in the predict function? Please see my code below.

str(data_re)
# 'data.frame': 35 obs. of  17 variables:
# $ ProportionBirdsScavenging: num  0.6619 0.4062 0.6943 0.0143 0.0143 ...
# $ OverheadCover            : num  0.7 0.671 0.679 0.79 0.62 ...
# $ Area                     : Factor w/ 6 levels "Hamert","KempenBroek",..: 3 1 1 1 1 1 1 1 1 2 ...
# $ pointWeight              : int  3 233 10 89 4 22 44 99 89 17 ...

mygam <- mgcv::gam(ProportionBirdsScavenging ~ OverheadCover + s(Area, bs="re"), family=betar(link="logit"), data = data_re, weights = pointWeight)
new.xgam <- expand.grid(OverheadCover = seq(0, 1, length.out = 1000))
new.xgam$Area <- "a" # pad new.xgam with an arbitrary value for variable Area -> https://stackoverflow.com/questions/54411851/mgcv-how-to-use-exclude-argument-in-predict-gam
new.ygam <- predict.gam(mygam, newdata = new.xgam, type = "terms", exclude = "s(Area)") # Because I have to specify type = "terms", I can't specify type = "response".
new.ygam <- data.frame(new.ygam)

head(new.ygam) # not on the response scale (0,1)
# OverheadCover
# 1   0.000000000
# 2  -0.004390776
# 3  -0.008781551
# 4  -0.013172327
# 5  -0.017563103
# 6  -0.021953878

Solution

  • You're misreading the documentation for the argument exclude:

    exclude: if type=="terms" or type="iterms" then terms (smooth or parametric) named in this array will not be returned. Otherwise any smooth terms named in this array will be set to zero. If NULL then no terms are excluded. Note that this is the term names as it appears in the model summary, see example. You can avoid providing the covariates for the excluded terms by setting newdata.guaranteed=TRUE, which will avoid all checks on newdata.

    (emphasis mine).

    You can use type = "response", exclude = "s(Area)") and the random effect should be ignored. You do have to pass in to newdata some values for Area otherwise this won't work; just set the Area column in the newdata to be all the first level of Area.

    If you are very careful you can avoid passing in the ranef variable too. If you are sure that what you pass to newdata is a correctly specified set of variables for the model, then you can leave out Area and pass newdata.guaranteed = TRUE to predict() to stop predict() from checking that you have correctly passed all variables needed for the model.

    See the example in ?mgcv::random.effects for both types of behaviour.