Search code examples
gamhierarchicalmgcvtrend

Annual count index from GAM looking at long-term trends by site


I'm interested in estimating a shared, global trend over time for counts monitored at several different sites using generalized additive models (gams). I've read this great introduction to hierarchical gams (hgams) by Pederson et al. (2019), and I believe I can setup the model as follows (the Pederson et al. (2019) GS model),

fit_model = gam(count ~ s(year, m = 2) + s(year, site, bs = 'fs', m = 2),
                data = count_df,
                family = nb(link = 'log'),
                method = 'REML')

I can plot the partial effect smooths, look at the fit diagnostics, and everything looks reasonable. My question is how to extract a non-centered annual relative count index? My first thought would be to add the estimated intercept (the average count across sites at the beginning of the time series) to the s(year) smooth (the shared global smooth). But I'm not sure if the uncertainty around that smooth already incorporates uncertainty in the estimated intercept? Or if I need to add that in? All of this was possible thanks to the amazing R libraries mgcv, gratia, and dplyr.


Solution

  • Your way doesn't include the uncertainty in the constant term, it just shifts everything around.

    If you want to do this it would be easier to use the constant argument to gratia:::draw.gam():

    draw(fit_model, select = "s(year)", constant = coef(fit_model)[1L])
    

    which does what your code does, without as much effort (on your part).

    An better way — with {gratia}, seeing as you are using it already — would be to create a data frame containing a sequence of values over the range of year and then use gratia::fitted_values() to generate estimates from the model for those values of year. To get what you want (which seems to be to exclude the random smooth component of the fit, such that you are setting the random component to equal 0 on the link scale) you need to pass that smooth to the exclude argument:

    ## data to predict at
    new_year <- with(count_df,
                     tibble(year = gratia::seq_min_max(year, n = 100),
                            site = factor(levels(site)[1], levels = levels(site)))
    
    ## predict
    fv <- fitted_values(fit_model, data = new_year, exclude = "s(year,site)")
    

    If you want to read about exclude, see ?predict.gam