GAM prediction in R with zero covariates

I have estimated a GAM model (using the mgcv package) with an intercept and two smoothed terms such as:

y = intercept + b1*s(x1) + b2*s(x2)

But when I predict y on new dataset of a single row with x1=0,x2=0 and type="response" , I would expect to return the estimated intercept value as a prediction (as it happens with the simple linear models case). However, it returns a non-intercept value which I can not justify. Any idea what I am missing here?

Much appreciated for any help/idea.

Solution

There are two problems here. Firstly, the model you have fitted is not(!)

\hat{y_i} = \beta_0 + \beta_1 f_1(x_{1i}) + \beta_2 f_2(x_{2i})

there isn't a coefficient or vector of coefficients that are multiplied by the smooth function of a covariate. Assuming you didn't do anything fancy in {mgcv}, and going by your description of the model you didn't, then you fitted

m <- gam(y ~ s(x1) + s(x2), data = my_data, ....)

This model is

\hat{y_i} = \beta_0 + f_1(x_{1i}) + f_2(x_{2i})

where the functions f_j are assumed to be smooth and are represented in the model via penalised splines. The splines themselves are composed of a set of K (typically) basis functions evaluated at the values of the jth covariate b_k(x_{ji}). For your s(x1), then we have K=10, and k = 1, 2, ..., K, so your smooth is composed of 10 thinplate splines basis functions. Each of those basis functions are associated with a coefficient that weights their respective basis function (values of the coefficients are found that maximise the fit to the data without fitting an overly-wiggly function). Hence each f is represent as a sum of weighted basis functions, and the response of modelled as a sum of those functions plus the constant term.

The second misunderstanding is that you are confusing the value of the smooth function with the value of covariate. When you predict with x1 = x2 = 0, you have to evaluate each basis function in the respective smooth at that value of the covariate, apply the weights (coefficients) and then sum these weighted values of the basis functions evaluated at x1 = 0 and x2 = 0. It is unlikely that the value of each smooth will be 0 when evaluated at these values of the covariates.

As each smooth in your model is subject to an identifiability constraint, because the model includes a constant term (the intercept) and this constant function is in the span of the two bases (for s(x1) and s(x2)), there are actually 9 basis functions per smooth with the default k. This identifiability constraint ensures that each smooth sums to 0 over the range of the covariate. In other words, each smooth is centred about the overall mean of the response (the intercept). As such, there will be a value of each covariate that results in the value of f_j(x_{ji}) == 0, but it is unlikely to be at x_{ji} == 0 unless you have done something specific to force this (for example via a point constraint on the smooth through argument pc).

In summary, you are assuming that the value of the smooth is = 0 when evaluated at a covariate value of 0, and that is unlikely to be so in general unless specific action has been taken to force this. In other words, s(x1 = 0) need not be equal to 0 and typically won't be. To see this, just plot your smooths with plot(m, pages = 1) and see what the value of each smooth function is for a value of x1 = 0 and x2 = 0.