Search code examples
hierarchical-datagam

Hierarchical GAMs - why am I not getting individual results for my categorical variable?


I am trying to do a HGAM in order to separate my large dataset by Reservoir. One of my predictor variables is Season_Y, indicating the season and year it the sample was taken. Here is my code:

 model = gam( 
    log1p(GEO) ~ s(Season_Y, by=Res, k=, bs ="fs") +
    t2(TP, TIN.TP, by=Res, bs="fs", k=5) + 
    s(Sil, by=Res, k=5) +
    s(SO4, by=Res, k=5) +
    s(Dfe, by=Res, k=5) +
    s(Ortho, by=Res, k=5) +
    s(NO3.NH3, by=Res, k=5),
  data=gam.data,
  family=Gamma,
  method="REML" 
) 

Normally in a GLM I get individual scores for each subsequent Season_Y, e.g. Winter_2019. but I only an overall Season_Y score. As displayed below:

                                    edf Ref.df         F  p-value    
s(Season_Y):ResAlaw           4.603e+00  6.000 3.091e+02  0.99783    
s(Season_Y):ResAlwen          2.474e+00  5.000 2.085e+03  0.99962    
s(Season_Y):ResCefni          2.600e+00  7.000 1.168e+03  0.99809    
s(Season_Y):ResDolwen         6.863e-05  1.000 4.220e+07  0.99803    
s(Season_Y):ResLlandegfedd    6.795e-05  1.000 1.219e+07  0.99856    
s(Season_Y):ResLlwyn Onn      4.350e+00  6.000 3.917e+01  0.99830    
s(Season_Y):ResPentwyn        1.267e-04  2.000 4.130e+05  0.99996    
s(Season_Y):ResPlas Uchaf     4.798e-02  5.000 2.801e+04  0.99768    
s(Season_Y):ResPontsticill    3.182e+00  7.000 1.672e+01  0.99950   

When I do gam.check I don't get any values:

                                   k'      edf k-index p-value    
s(Season_Y):ResAlaw           7.00e+00 4.60e+00      NA      NA    
s(Season_Y):ResAlwen          7.00e+00 2.47e+00      NA      NA    
s(Season_Y):ResCefni          7.00e+00 2.60e+00      NA      NA    
s(Season_Y):ResDolwen         7.00e+00 6.86e-05      NA      NA    
s(Season_Y):ResLlandegfedd    7.00e+00 6.79e-05      NA      NA    
s(Season_Y):ResLlwyn Onn      7.00e+00 4.35e+00      NA      NA    
s(Season_Y):ResPentwyn        7.00e+00 1.27e-04      NA      NA    
s(Season_Y):ResPlas Uchaf     7.00e+00 4.80e-02      NA      NA    
s(Season_Y):ResPontsticill    7.00e+00 3.18e+00      NA      NA  

Just want a breakdown of how each season from each year is significant, please?

Any help would be much appreciated.

Thank you!


Solution

  • If Season_Y really contains data like "Winter_2019" you are using the fs basis incorrectly. This basis is the equivalent of a random slope, but instead of a linear effect we get a smooth effect for each level of the grouping variable. The critical thing here is that you are producing a smooth (lots of them) so you have to pass it a continuous variable and Season_Y isn't continuous. If this is even doing anything sensible, it is just creating a random effect (intercept) and hence it would be much clearer in terms of the intention of the code to use:

    s(Season_Y, by = Res, bs = "re")
    

    As to why everything is NA in the basis dimension check, you asked for random effects and the basis dimension check is not available for those terms. From ?check.k:

    Currently smooths of factor variables are not supported and will give an NA p-value.

    I think you are genuinely confusing outputs here. The first output block you show is the output from summary.gam(). The second output is from gam.check() (or k.check()) and these are not the same thing at all.