Search code examples
rgammgcv

mgcv: gam.check() low p-value but not enough variable combinations to increase basis functions k


I have a similar issue described here but with the difference that I cannot increase the k-value without getting the error:

Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : 
  A term has fewer unique covariate combinations than specified maximum degrees of freedom

I am trying to model count data of an observed species with multiple covariates in the GAM such as the moon (percentage of illumination), cloud cover (percentage), survey duration (minutes) etc.

> gam_Sp1 <- gam(ln~s(LunarPerc, k=20) + s(Duration, k=30) + s(Clouds, k=20) 
+ (Visibility, k=3) + Seastate + WindDir, data=df_count, method="REML")

The variable visibility is the issue here, I cannot increase the basis functions k any higher than 3 due to unique combinations. I have excluded all surveys with visibility <90%, so now the only values I have in my dataset are 90, 95 and 100%. Here the gam.check() output with visibility k=3

> gam.check(gam_Sp1_perc_all) #all sign--> model unstable? NO,because:  

Method: REML   Optimizer: outer newton
full convergence after 10 iterations.
Gradient range [-5.630335e-05,5.578655e-05]
(score 204.5643 & scale 0.2980554).
Hessian positive definite, eigenvalue range [4.648033e-05,107.0027].
Model rank =  81 / 81 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

                 k'   edf k-index p-value  
s(LunarPerc) 19.00  1.51    0.98   0.370  
s(Duration)   29.00  1.00    1.07   0.830  
s(Clouds)     19.00  1.96    1.08   0.870  
s(Visibility)  2.00  1.00    0.88   0.035 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Visibility has a low p-value but I cannot increase the k-value, anything else I can do? Here they suggest to increase the number of covariates but I think I have enough in the model. The difference between edf and k is not big either, so any suggestions what might be causing this? Or might it be better to add visibility as a linear variable in the model since edf=1?

Cheers


Solution

  • There's not much point trying to smooth a variable that has only three unique values; the effect would have to depart from non-linearity to a huge degree to be identified against a linear fit.

    In situations like this, just fit it as Visibility or poly(Visibility, 2) for a linear of a quadratic parametric term respectively.

    Note the difference between k' and edf is not that big because it can't be much larger. k' is the maximum values possible given the identifiability constraints, and edf is as small as is possible given that the smooth has an unpenalized null space (the linear component).