How to plot the smooths from multiple GAMs on the same plot

I am quite new to all this so not sure if this is a simple question!

I am trying to plot the smooths from two GAM models on the same axes. I can see that there have already been various questions similar to this in the past, but I can't find a specific answer to my question.

I have data (dset) which looks has about 600 study participants (studynr), each with somewhere between 1 and 15 study visits (one row of dset for each visit). At each visit a complete list of their prescribed medications was captured. I have categorised these medications into two categories and counted them (TOTAL_P and TOTAL_S) and am interested in the trajectories in the total number of these medications over time. Time is negative as it is in years before a certain event.

I have tried to produce a small amount of dummy data here, but the full dataset is much larger, with differing numbers of visits per participant and the time is not in integers values.

dset <- data.frame(studynr=rep(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20), each =10),
                      time = rep(c(-9,-8,-7,-6,-5,-4,-3,-2,1,0), times=20), 
                    TOTAL_P = sample(1:10, 200, replace=T),
                    TOTAL_S = sample(1:10, 200, replace=T))

I then have two GAM models (using bam() as my dset is quite large)

bam_s <- bam(TOTAL_S ~ s(time)
           + s(studynr, bs = 're') 
           + s(studynr, time, bs = 're'),
           data = dset, method = "REML")

bam_p <- bam(TOTAL_P ~ s(time)
             + s(studynr, bs = 're')
             + s(studynr, time, bs = 're'),
             data = dset, method = "REML")

What I want to do is plot the smooth from each model on one combined plot, with the number of medications on the y-axis and time on the x-axis.

I have tried a number of different options using {gratia}'s compare_smooth() and plot.gam() without success. Ultimately I want to try to get the plot in a ggplot2 format, the main issue I am facing is getting the y-axis scale correct.

Thanks.

Solution

The brute force way is to simply predict from both models, excluding both random effect terms and then plot:

library("gratia")
library("mgcv")
library("dplyr")
library("ggplot2")

N <- 100
ds <- data_slice(dset, time = evenly(time, n = N)) |>
  select(-TOTAL_P, -TOTAL_S)

fv_s <- fitted_values(bam_s, data = ds,
                      exclude = c("s(studynr)", "s(studynr,time)"))
fv_p <- fitted_values(bam_p, data = ds,
                      exclude = c("s(studynr)", "s(studynr,time)"))

fv <- fv_s |>
  bind_rows(fv_p) |>
  mutate(category = rep(c("S", "P"), each = N))

fv |>
  ggplot(aes(x = time, y = fitted, group = category)) +
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = category),
              alpha = 0.2) +
  geom_line(aes(colour = category)) +
  labs(y = "Count")

But it would be so much better to actually fit a model that would allow a statistical comparison:

library("tidyr")

df <- dset |>
  tidyr::pivot_longer(cols = c(-studynr, -time),
    values_to = "count", names_to = "category", names_prefix = "TOTAL_",
    names_ptypes = list(category = factor()))

m <- bam(count ~ category + s(time, by = category) +
           s(studynr, bs = 're') + s(studynr, time, bs = 're'),
         data = df, family = poisson(), method = "fREML")

Now we have:

> summary(m)                                                                  

Family: poisson 
Link function: log 

Formula:
count ~ category + s(time, by = category) + s(studynr, bs = "re") + 
    s(studynr, time, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.73291    0.02975  58.254   <2e-16 ***
categoryS    0.01352    0.04192   0.323    0.747    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                        edf Ref.df Chi.sq p-value
s(time):categoryP 3.262e+00  4.034  7.609   0.114
s(time):categoryS 2.961e+00  3.662  3.912   0.358
s(studynr)        8.032e-06  1.000  0.000   0.493
s(studynr,time)   8.264e-06  1.000  0.000   0.673

R-sq.(adj) =  0.0123   Deviance explained = 2.63%
fREML = 654.45  Scale est. = 1         n = 400

draw(m)

and we can estimate the difference between the two estimated smooths (and plot it) with:

dif <- difference_smooths(m, smooth = "s(time)")
dif |> draw()

and now we can again predict from the model and plot as per the first example as you want these on the count scale:

ds <- data_slice(m, time = evenly(time, n = N), category = evenly(category))
fv <- fitted_values(m, data = ds,
                    exclude = c("s(studynr)", "s(studynr,time)"))
fv |>
  ggplot(aes(x = time, y = fitted, group = category)) +
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = category),
              alpha = 0.2) +
  geom_line(aes(colour = category)) +
  labs(y = "Count")

Some notes:

If this is a count response (as you describe) a Gaussian GAM is unlikely to be appropriate; in the second example I fitted a Poisson GAM, which would be the starting point for modelling counts.
When fitting with bam() you need method = "fREML", otherwise you miss out on a lot of the speed/memory advantages of fitting with bam().
The factor-by smooth in the second model is just one way you could fit this kind of thing.