Search code examples
rregressiongammgcv

Is it possible to control the degrees of freedom for the smooth functions in a GAM fit in R, and if so, how?


I am using the mgcv package in R to fit a GAM to some hydrologic data as follows:

d <- GAM_example_data[,1:4]
colnames(d) <- c("month","rain","pump","GWL")             
fitted_GAM <- gam(GWL~s(month) + s(rain) + s(pump), data = d)
plot.gam(fitted_GAM)

When I get the plots that are output from plot.gam, on the y-axis it tells me the degrees of freedom for each of the smoothing functions, and these are often non-integer values. I wish to be able to control the degrees of freedom for each of the smooth functions used, is there a way to do this?

I have seen references to specifying the "knots" and therefore controlling the fit but I am fairly new to the concept of GAMs and I haven't been able to find any clear resources explaining what these are (if they are even related to my problem at all).


Solution

  • I have been closely following how you would respond to the other answer. From your reply it appears that know several concepts in GAM well, then I could produce a short answer.

    Unfortunately, no. mgcv GAM is not doing estimation using backfitting, but performs a joint estimation of smoothing parameters by GCV or REML. So unlike the legacy gam package, where you can specify a df for each spline term, you can't achieve this in mgcv.

    The only way to control smoothness in penalized regression setting, is to set smoothing parameter sp, but its relationship with degree of freedom is not in closed form and you can not foresee it.

    The other answer is suggesting you doing a pure regression spline without penalization. By setting a rank k and signaling fx = TRUE, you always have degree of freedom equal to rank minus one (as a result of centering constraint), which is an integer.


    Here are some other answers I made on smoothing.

    smooth.spline(): fitted model does not match user-specified degree of freedom explains how setting df works in smooth.spline. Note that this is the basis of backfitting GAM.

    How to interpret lm() coefficient estimates when using bs() function for splines explains the basis of pure regression spline. Of course, mgcv offers a great many spline basis class, not just the B-spline used by splines::bs.