Search code examples
rgammgcv

How to extract fitted splines from a GAM (`mgcv::gam`)


I am using GAM to model time trends in a logistic regression. Yet I would like to extract the the fitted spline from it to add it to another model, that cannot be fitted in GAM or GAMM.

Thus I have 2 questions:

  1. How can I fit a smoother over time so that I force one knot to be at a particular location while letting the model to find the other knots?

  2. How can I extract the matrix from the fitted GAM so that I can use it in as an impute for a different model?

The types of models I am running are to the following form:

gam <- gam(mortality.under.2~ maternal_age_c+ I(maternal_age_c^2)+
           s(birth_year,by=wealth2) + wealth2 + sex +
           residence + maternal_educ + birth_order,
           data=colombia2, family="binomial")

I've read the extensive documentation for the GAM but I am not sure still. Any suggestion is really appreciated.


Solution

  • In mgcv::gam there is a way to do this (your Q2), via the predict.gam method and type = "lpmatrix".

    ?predict.gam even has an example, which I reproduce below:

     library(mgcv)
     n <- 200
     sig <- 2
     dat <- gamSim(1,n=n,scale=sig)
     
     b <- gam(y ~ s(x0) + s(I(x1^2)) + s(x2) + offset(x3), data = dat)
     
     newd <- data.frame(x0=(0:30)/30, x1=(0:30)/30, x2=(0:30)/30, x3=(0:30)/30)
     
     Xp <- predict(b, newd, type="lpmatrix")
    
     ##################################################################
     ## The following shows how to use use an "lpmatrix" as a lookup 
     ## table for approximate prediction. The idea is to create 
     ## approximate prediction matrix rows by appropriate linear 
     ## interpolation of an existing prediction matrix. The additivity 
     ## of a GAM makes this possible. 
     ## There is no reason to ever do this in R, but the following 
     ## code provides a useful template for predicting from a fitted 
     ## gam *outside* R: all that is needed is the coefficient vector 
     ## and the prediction matrix. Use larger `Xp'/ smaller `dx' and/or 
     ## higher order interpolation for higher accuracy.  
     ###################################################################
     
     xn <- c(.341,.122,.476,.981) ## want prediction at these values
     x0 <- 1         ## intercept column
     dx <- 1/30      ## covariate spacing in `newd'
     for (j in 0:2) { ## loop through smooth terms
       cols <- 1+j*9 +1:9      ## relevant cols of Xp
       i <- floor(xn[j+1]*30)  ## find relevant rows of Xp
       w1 <- (xn[j+1]-i*dx)/dx ## interpolation weights
       ## find approx. predict matrix row portion, by interpolation
       x0 <- c(x0,Xp[i+2,cols]*w1 + Xp[i+1,cols]*(1-w1))
     }
     dim(x0)<-c(1,28) 
     fv <- x0%*%coef(b) + xn[4];fv    ## evaluate and add offset
     se <- sqrt(x0%*%b$Vp%*%t(x0));se ## get standard error
     ## compare to normal prediction
     predict(b,newdata=data.frame(x0=xn[1],x1=xn[2],
             x2=xn[3],x3=xn[4]),se=TRUE)
    

    That goes through the entire process even the prediction step which would be done outside R or of the GAM model. You are going to have to modify the example a bit to do what you want as the example evaluates all terms in the model and you have two other terms besides the spline - essentially you do the same thing, but only for the spline terms, which involves finding the relevant columns and rows of the Xp matrix for the spline. Then also you should note that the spline is centred so you may or may not want to undo that too.

    For your Q1, choose appropriate values for the xn vector/matrix in the example. These correspond to values for the nth term in the model. So set the ones you want fixed to some mean value and then vary the one associated with the spline.

    If you are doing all of this in R, it would be easier to just evaluate the spline at the values of the spline covariate that you have data for that is going into the other model. You do that by creating a data frame of values at which to predict at, then use

    predict(mod, newdata = newdat, type = "terms")
    

    where mod is the fitted GAM model (via mgcv::gam), newdat is the data frame containing a column for each variable in the model (including the parametric terms; set the terms you don't want to vary to some constant mean value [say the average of the variable in the data set] or certain level if a factor). The type = "terms" part will return a matrix for each row in newdat with the "contribution" to the fitted value for each term in the model, including the spline term. Just take the column of this matrix that corresponds to the spline - again it is centered.

    Perhaps I misunderstood your Q1. If you want to control the knots, see the knots argument to mgcv::gam. By default, mgcv::gam places a knot at the extremes of the data and then the remaining "knots" are spread at evenly spaced quantiles of the data. mgcv::gam doesn't find the knots - it places them for you and you can control where it places them via the knots argument.