Search code examples
rgammgcv

Is it necessary to include a factor (fitted as a smooth-factor interaction) as a parametric term within a gam?


I am interested in investigating a non-linear temporal trend in a data set and so I would like to use the R package mgcv to fit the following GAM:

model1 <- gam(Variable ~ s(Date, by = Site.Factor), data = data)

where Variable is the continuous variable of interest, Site.Factor is a factor with two levels and Date is a continuous variable.

I have read that know that because of the inclusion of the by factor within the smoothing function, differences in the means of the two factor levels are not accounted for. I should therefore include Site.Factor as a parametric term like so:

model2 <- gam(Variable ~ Site.Factor + s(Date, by = Site.Factor), data = data)

However, whilst I might expect the influence of Site.Factor on the smooth to be significant, I do not expect the means of each level of the factor to be significant. Do I still need to include the factor separately within the model as in model1, or would model2 be okay?


Solution

  • Unless you know that the populations from which your data are drawn have exactly the same mean then yes, you should include the term Site.Factor as a fixed effect term, whether that difference in sample is significant or not.