Search code examples
rregressionsplinegammgcv

mgcv: How to set number and / or locations of knots for splines


I want to use function gam in mgcv packages:

 x <- seq(0,60, len =600)
 y <- seq(0,1, len=600) 
 prova <- gam(y ~ s(x, bs='cr')

can I set the number of knots in s()? and then can I know where are the knots that the spline used? Thanks!


Solution

  • While setting k is the correct way to go, fx = TRUE is definitely not right: it will force using pure regression spline without penalization.


    locations of knots

    For penalized regression spline, the exact locations are not important, as long as:

    • k is adequately big;
    • the spread of knots has good, reasonable coverage.

    By default:

    • natural cubic regression spline bs = 'cr' places knots by quantile;
    • B-splines family (bs = 'bs', bs = 'ps', bs = 'ad') place knots evenly.

    Compare the following:

    library(mgcv)
    
    ## toy data
    set.seed(0); x <- sort(rnorm(400, 0, pi))  ## note, my x are not uniformly sampled
    set.seed(1); e <- rnorm(400, 0, 0.4)
    y0 <- sin(x) + 0.2 * x + cos(abs(x))
    y <- y0 + e
    
    ## fitting natural cubic spline
    cr_fit <- gam(y ~ s(x, bs = 'cr', k = 20))
    cr_knots <- cr_fit$smooth[[1]]$xp  ## extract knots locations
    
    ## fitting B-spline
    bs_fit <- gam(y ~ s(x, bs = 'bs', k = 20))
    bs_knots <- bs_fit$smooth[[1]]$knots  ## extract knots locations
    
    ## summary plot
    par(mfrow = c(1,2))
    plot(x, y, col= "grey", main = "natural cubic spline");
    lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
    abline(v = cr_knots, lty = 2)
    plot(x, y, col= "grey", main = "B-spline");
    lines(x, bs_fit$linear.predictors, col = 2, lwd = 2)
    abline(v = bs_knots, lty = 2)
    

    enter image description here

    You can see the difference in knots placement.


    Setting your own knots locations:

    You can also provide your customized knots locations via the knots argument of gam() (yes, knots are not fed to s(), but to gam()). For example, you can do evenly spaced knots for cr:

    xlim <- range(x)  ## get range of x
    myfit <- gam(y ~ s(x, bs = 'cr', k = 20),
             knots = list(x = seq(xlim[1], xlim[2], length = 20)))
    

    Now you can see that:

    my_knots <- myfit$smooth[[1]]$xp
    plot(x, y, col= "grey", main = "my knots");
    lines(x, myfit$linear.predictors, col = 2, lwd = 2)
    abline(v = my_knots, lty = 2)
    

    enter image description here

    However, there is usually no need to set knots yourself. But if you do want to do this, you must be clear what you are doing. In particular, the number of knots you provide must not conflict with the k in s().

    This is a very rich answer. The length of bs_knots is 24. The "dimension" of the spline basis is in bs_fit$smooth[[1]]$bs.dim, which is 20.

    Yes, for B-splines family, the number of B-splines does not equal the number of knots. Knots placement for B-splines is a dirty work and error-prone. See https://stackoverflow.com/a/72723391/4891738 for a demonstration with B-splines.