Search code examples
predictsplinegammgcv

Generate predictions for GAM (mgcv) using penalised cubic spline forced through a certain point


I am using MGCV to generate predictions for a GAM using a cubic spline forced through a certain point. I have generated the gam exactly as outlined in this post: https://stat.ethz.ch/pipermail/r-help/2013-March/350253.html

However, when I try to predict y for new values of x within the range of the gam, I get this error: Error: variable 'X' was fitted with type "nmatrix.8" but type "numeric" was supplied.

Here is the script I’m using to try to make the predictions:

  1. From the original post
## Fake some data...

library(mgcv)
set.seed(0)
n <- 100
x <- runif(n)*4-1;x <- sort(x);
f <- exp(4*x)/(1+exp(4*x));y <- f+rnorm(100)*0.1;plot(x,y)
dat <- data.frame(x=x,y=y)

## Create a spline basis and penalty, making sure there is a knot
## at the constraint point, (0 here, but could be anywhere)
knots <- data.frame(x=seq(-1,3,length=9)) ## create knots
## set up smoother...
sm <- smoothCon(s(x,k=9,bs="cr"),dat,knots=knots)[[1]]

## 3rd parameter is value of spline at knot location 0,
## set it to 0 by dropping...
X <- sm$X[,-3]        ## spline basis
S <- sm$S[[1]][-3,-3] ## spline penalty
off <- y*0 + .6       ## offset term to force curve through (0, .6)

## fit spline constrained through (0, .6)...
b <- gam(y ~ X - 1 + offset(off),paraPen=list(X=list(S)))
lines(x,predict(b))
  1. To predict:
library(tidyverse)
  
 # Predict values across concentrations within measured range
  x_seq <- seq(-1, 3, length.out = 10000)
  newdata <- data.frame(X = x_seq)

  # Generate the spline basis matrix for the new data
  sm_new <- smoothCon(s(X, k=9, bs="cr"), newdata, knots=knots)[[1]]
  X_new <- sm_new$X[,-3]  # Match the basis matrix structure used in the model
  
  # Create a data frame with the spline basis matrix
  X_new_df <- as.data.frame(X_new)
    
  # Predict the offset term for the new data
  off_new <- y*0 + .6 # Use the same offset as in the original model
  # Combine the new data with the basis matrix and the offset
  predict_data <- cbind(newdata, X_new_df, off_new) %>% 
    rename(off = off_new)

  # Predict values using the GAM
  predictions <- predict(b, newdata = predict_data, se.fit = TRUE)

I have also tried to put the variables required for a function into a list so they can be called as an argument to ‘data’, but this creates a separate issue wherein the matrix (X) is not recognised: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 27, 8

Here is the adjusted script I use to build this model, which generates the error:

list.dat <- list(y = y, 
                   X = X, 
                   S = S, 
                   off = off)

b <- gam(y ~ X - 1 + offset(off), 
                     data = list.dat, paraPen=list(X=list(S)))

Can you help me generate predictions for y along more values of x than in the original data? I would prefer to give my variables to the model as a list in the data argument, but I am open to other suggestions too.


Solution

  • You can now do this directly with s() via its pc argument (for "point constraint"):

    library(mgcv)
    set.seed(0)
    n <- 100
    x <- runif(n) * 4 - 1
    x <- sort(x)
    f <- exp(4 * x) / (1 + exp(4 * x))
    y <- f + rnorm(100) * 0.1
    dat <- data.frame(x = x, y = y)
    
    m <- gam(y ~ s(x, pc = 0), data = dat, method = "REML")
    

    Then predict() etc works as expected:

    library("gratia")
    
    ds <- data_slice(m, x = evenly(x))
    fv <- fitted_values(m, data = ds) # calls predict
    fv
    
    # A tibble: 100 × 6
        .row      x .fitted    .se .lower_ci .upper_ci
       <int>  <dbl>   <dbl>  <dbl>     <dbl>     <dbl>
     1     1 -0.946  0.0364 0.0526  -0.0666      0.140
     2     2 -0.907  0.0375 0.0463  -0.0534      0.128
     3     3 -0.867  0.0386 0.0406  -0.0410      0.118
     4     4 -0.828  0.0402 0.0358  -0.0299      0.110
     5     5 -0.788  0.0424 0.0319  -0.0201      0.105
     6     6 -0.749  0.0455 0.0292  -0.0118      0.103
     7     7 -0.709  0.0497 0.0277  -0.00467     0.104
     8     8 -0.670  0.0554 0.0272   0.00202     0.109
     9     9 -0.630  0.0627 0.0273   0.00923     0.116
    10    10 -0.591  0.0720 0.0276   0.0179      0.126
    # ℹ 90 more rows
    # ℹ Use `print(n = ...)` to see more rows
    

    And we can confirm the point constraint via a plot of the smooth:

    draw(m) # or plot(m)
    

    plot of the estimated smooth, showing the smooth passing through y == 0 at the value x == 0