I am interested in investigating a non-linear temporal trend in a data set and so I would like to use the R package mgcv
to fit the following GAM:
model1 <- gam(Variable ~ s(Date, by = Site.Factor), data = data)
where Variable
is the continuous variable of interest, Site.Factor
is a factor with two levels and Date
is a continuous variable.
I have read that know that because of the inclusion of the by factor within the smoothing function, differences in the means of the two factor levels are not accounted for. I should therefore include Site.Factor
as a parametric term like so:
model2 <- gam(Variable ~ Site.Factor + s(Date, by = Site.Factor), data = data)
However, whilst I might expect the influence of Site.Factor
on the smooth to be significant, I do not expect the means of each level of the factor to be significant. Do I still need to include the factor separately within the model as in model1
, or would model2
be okay?
Unless you know that the populations from which your data are drawn have exactly the same mean then yes, you should include the term Site.Factor
as a fixed effect term, whether that difference in sample is significant or not.