I try to fit a GAM using the gam
package (I know mgcv
is more flexible, but I need to use gam
here). I now have the problem that the model looks good, but in comparison with the original data it seems to be offset along the y-axis by a constant value, for which I cannot figure out where this comes from.
This code reproduces the problem:
library(gam)
data(gam.data)
x <- gam.data$x
y <- gam.data$y
fit <- gam(y ~ s(x,6))
fit$coefficients
#(Intercept) s(x, 6)
# 1.921819 -2.318771
plot(fit, ylim = range(y))
points(x, y)
points(x, y -1.921819, col=2)
legend("topright", pch=1, col=1:2, legend=c("Original", "Minus intercept"))
Chambers, J. M. and Hastie, T. J. (1993) Statistical Models in S (Chapman & Hall) shows that there should not be an offset, and this is also intuitively correct (the smooth should describe the data).
I noticed something comparable in mgcv
, which can be solved by providing the shift
parameter with the intercept value of the model (because the smooth is seemingly centred). I thought the same could be true here, so I subtracted the intercept from the original data-points. However, the plot above shows this idea wrong. I don't know where the extra shift comes from. I hope someone here may be able to help me.
(R version. 3.3.1; gam
version 1.12)
I think I should first explain various output in the fitted GAM model:
library(gam)
data(gam.data)
x <- gam.data$x
y <- gam.data$y
fit <-gam(y ~ s(x,6), model = FALSE)
## coefficients for parametric part
## this includes intercept and null space of spline
beta <- coef(fit)
## null space of spline smooth (a linear term, just `x`)
nullspace <- fit$smooth.frame[,1]
nullspace - x ## all 0
## smooth space that are penalized
## note, the backfitting procedure guarantees that this is centred
pensmooth <- fit$smooth[,1]
sum(pensmooth) ## centred
# [1] 5.89806e-17
## estimated smooth function (null space + penalized space)
smooth <- nullspace * beta[2] + pensmooth
## centred smooth function (this is what `plot.gam` is going to plot)
c0 <- mean(smooth)
censmooth <- smooth - c0
## additive predictors (this is just fitted values in Gaussian case)
addpred <- beta[1] + smooth
You can first verify that addpred
is what fit$additive.predictors
gives, and since we are fitting additive models with Gaussian response, this is also as same as fit$fitted.values
.
What plot.gam
does, is to plot censmooth
:
plot.gam(fit, col = 4, ylim = c(-1.5,1.5))
points(x, censmooth, col = "gray")
Remember, there is
addpred = beta[0] + censmooth + c0
If you want to shift original data y
to match this plot, you not only need to subtract intercept (beta[0]
), but also c0
from y
:
points(x, y - beta[1] - c0)