I am quite new to all this so not sure if this is a simple question!
I am trying to plot the smooths from two GAM models on the same axes. I can see that there have already been various questions similar to this in the past, but I can't find a specific answer to my question.
I have data (dset) which looks has about 600 study participants (studynr), each with somewhere between 1 and 15 study visits (one row of dset for each visit). At each visit a complete list of their prescribed medications was captured. I have categorised these medications into two categories and counted them (TOTAL_P and TOTAL_S) and am interested in the trajectories in the total number of these medications over time. Time is negative as it is in years before a certain event.
I have tried to produce a small amount of dummy data here, but the full dataset is much larger, with differing numbers of visits per participant and the time is not in integers values.
dset <- data.frame(studynr=rep(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20), each =10),
time = rep(c(-9,-8,-7,-6,-5,-4,-3,-2,1,0), times=20),
TOTAL_P = sample(1:10, 200, replace=T),
TOTAL_S = sample(1:10, 200, replace=T))
I then have two GAM models (using bam()
as my dset is quite large)
bam_s <- bam(TOTAL_S ~ s(time)
+ s(studynr, bs = 're')
+ s(studynr, time, bs = 're'),
data = dset, method = "REML")
bam_p <- bam(TOTAL_P ~ s(time)
+ s(studynr, bs = 're')
+ s(studynr, time, bs = 're'),
data = dset, method = "REML")
What I want to do is plot the smooth from each model on one combined plot, with the number of medications on the y-axis and time on the x-axis.
I have tried a number of different options using {gratia}'s compare_smooth()
and plot.gam()
without success. Ultimately I want to try to get the plot in a ggplot2 format, the main issue I am facing is getting the y-axis scale correct.
Thanks.
The brute force way is to simply predict from both models, excluding both random effect terms and then plot:
library("gratia")
library("mgcv")
library("dplyr")
library("ggplot2")
N <- 100
ds <- data_slice(dset, time = evenly(time, n = N)) |>
select(-TOTAL_P, -TOTAL_S)
fv_s <- fitted_values(bam_s, data = ds,
exclude = c("s(studynr)", "s(studynr,time)"))
fv_p <- fitted_values(bam_p, data = ds,
exclude = c("s(studynr)", "s(studynr,time)"))
fv <- fv_s |>
bind_rows(fv_p) |>
mutate(category = rep(c("S", "P"), each = N))
fv |>
ggplot(aes(x = time, y = fitted, group = category)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = category),
alpha = 0.2) +
geom_line(aes(colour = category)) +
labs(y = "Count")
But it would be so much better to actually fit a model that would allow a statistical comparison:
library("tidyr")
df <- dset |>
tidyr::pivot_longer(cols = c(-studynr, -time),
values_to = "count", names_to = "category", names_prefix = "TOTAL_",
names_ptypes = list(category = factor()))
m <- bam(count ~ category + s(time, by = category) +
s(studynr, bs = 're') + s(studynr, time, bs = 're'),
data = df, family = poisson(), method = "fREML")
Now we have:
> summary(m)
Family: poisson
Link function: log
Formula:
count ~ category + s(time, by = category) + s(studynr, bs = "re") +
s(studynr, time, bs = "re")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.73291 0.02975 58.254 <2e-16 ***
categoryS 0.01352 0.04192 0.323 0.747
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(time):categoryP 3.262e+00 4.034 7.609 0.114
s(time):categoryS 2.961e+00 3.662 3.912 0.358
s(studynr) 8.032e-06 1.000 0.000 0.493
s(studynr,time) 8.264e-06 1.000 0.000 0.673
R-sq.(adj) = 0.0123 Deviance explained = 2.63%
fREML = 654.45 Scale est. = 1 n = 400
draw(m)
and we can estimate the difference between the two estimated smooths (and plot it) with:
dif <- difference_smooths(m, smooth = "s(time)")
dif |> draw()
and now we can again predict from the model and plot as per the first example as you want these on the count scale:
ds <- data_slice(m, time = evenly(time, n = N), category = evenly(category))
fv <- fitted_values(m, data = ds,
exclude = c("s(studynr)", "s(studynr,time)"))
fv |>
ggplot(aes(x = time, y = fitted, group = category)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = category),
alpha = 0.2) +
geom_line(aes(colour = category)) +
labs(y = "Count")
Some notes:
bam()
you need method = "fREML"
, otherwise you miss out on a lot of the speed/memory advantages of fitting with bam()
.