Search code examples
rplotregressiongam

`gam` package: extra shift spotted when sketching data on `plot.gam`


I try to fit a GAM using the gam package (I know mgcv is more flexible, but I need to use gam here). I now have the problem that the model looks good, but in comparison with the original data it seems to be offset along the y-axis by a constant value, for which I cannot figure out where this comes from.

This code reproduces the problem:

library(gam)
data(gam.data)
x <- gam.data$x
y <- gam.data$y
fit <- gam(y ~ s(x,6))

fit$coefficients
#(Intercept)     s(x, 6) 
#   1.921819   -2.318771

plot(fit, ylim = range(y))
points(x, y)
points(x, y -1.921819, col=2)
legend("topright", pch=1, col=1:2, legend=c("Original", "Minus intercept"))

enter image description here

Chambers, J. M. and Hastie, T. J. (1993) Statistical Models in S (Chapman & Hall) shows that there should not be an offset, and this is also intuitively correct (the smooth should describe the data).

I noticed something comparable in mgcv, which can be solved by providing the shift parameter with the intercept value of the model (because the smooth is seemingly centred). I thought the same could be true here, so I subtracted the intercept from the original data-points. However, the plot above shows this idea wrong. I don't know where the extra shift comes from. I hope someone here may be able to help me.

(R version. 3.3.1; gam version 1.12)


Solution

  • I think I should first explain various output in the fitted GAM model:

    library(gam)
    data(gam.data)
    x <- gam.data$x
    y <- gam.data$y
    fit <-gam(y ~ s(x,6), model = FALSE)
    
    ## coefficients for parametric part
    ## this includes intercept and null space of spline
    beta <- coef(fit)
    
    ## null space of spline smooth (a linear term, just `x`)
    nullspace <- fit$smooth.frame[,1]
    
    nullspace - x  ## all 0
    
    ## smooth space that are penalized
    ## note, the backfitting procedure guarantees that this is centred
    pensmooth <- fit$smooth[,1]
    
    sum(pensmooth)  ## centred
    # [1] 5.89806e-17
    
    ## estimated smooth function (null space + penalized space)
    smooth <- nullspace * beta[2] + pensmooth
    
    ## centred smooth function (this is what `plot.gam` is going to plot)
    c0 <- mean(smooth)
    censmooth <- smooth - c0
    
    ## additive predictors (this is just fitted values in Gaussian case)
    addpred <- beta[1] + smooth
    

    You can first verify that addpred is what fit$additive.predictors gives, and since we are fitting additive models with Gaussian response, this is also as same as fit$fitted.values.

    What plot.gam does, is to plot censmooth:

    plot.gam(fit, col = 4, ylim = c(-1.5,1.5))
    points(x, censmooth, col = "gray")
    

    Remember, there is

    addpred = beta[0] + censmooth + c0
    

    If you want to shift original data y to match this plot, you not only need to subtract intercept (beta[0]), but also c0 from y:

    points(x, y - beta[1] - c0)
    

    enter image description here