I've read that the best way to report the effect of spline terms on an outcome is to plot the predicted probabilities. I've been able to do this using the frailty coxph model in the survival
package, by creating a new dataset with a dummy individual where the only differing variable is the spline variable, and then plotting the predictions for that dataset.
The code I've written looks like this:
### Example
library(tidyverse); library(survival)
# Create dataset
dat <- tibble(time = sample(seq(21), 1000, replace = TRUE),
status = sample(c(0,1), 1000, replace = TRUE),
var1 = sample(seq(100), 1000, replace = TRUE),
var2 = sample(seq(100), 1000, replace = TRUE),
group_id = seq(1, 1000, by = 1))
# Run model
fit <- coxph(Surv(time, status) ~ var1 + ns(var2, df = 3) + frailty(group_id), data = dat)
# Create dummy dataset
spline_values <- tibble(var1 = seq(1, 1000, by = 1))
newdata <- dat %>%
head(n = 1) %>%
select(time, status, var1, var2, group_id) %>%
slice(rep(1:n(), each = nrow(spline_values))
newdata <- newdata %>% mutate(var1 = spline_values$var1)
pred <- predict(fit, newdata = newdata, se.fit = TRUE, type = "risk")
newdata <- newdata %>% add_column(rr = pred$fit, se.fit = pred$se.fit)
newdata <- newdata %>% mutate(
low = exp(log(rr) - 1.96 * se.fit),
high = exp(log(rr) + 1.96 * se.fit))
ggplot(newdata, aes(var1)) +
geom_line(aes(y = rr)) +
geom_ribbon(aes(ymin = low, ymax = high), alpha = 0.1, linetype = 0)
However, in reality, my model has two levels of random effects, which is only possible to model using the coxme
package. Although there is a predict.coxme
function, it doesn't work on a new dataset:
library(coxme)
# Run coxme model
fit2 <- coxme(Surv(time, status) ~ var1 + ns(var2, df = 3) + (1 | group_id), data = dat)
pred <- predict(fit2, newdata = newdata, se.fit = TRUE, type = "risk")
This results in the following error:
Error in predict.coxme(fit2, newdata, se.fit = TRUE, type = "risk") : newdata argument not yet supported
The package author has stated that he would like to implement this but it is unlikely to occur any time soon. So my previous method won't work, and I can't seem to find any packages that will support this.
I'm left wondering if there's a way of visualising the spline effects from a coxme model. Does anyone have any ideas or recommendations?
If not then... well, adding the levels does not significantly change the coefficients of the equivalent coxph model, so I'm tempted to plot the visualisations using predictions from the coxph model and report the more accurate coefficients in an accompanying table. Does that sound acceptable or am I heading into dangerous waters?
For anyone who has come to this, the ehahelper package provides a predict_coxme function (as well as some other useful tools for working with coxme objects). It's definitely worth reading the documentation though, to see how the developer deals with the random effects and calculates the denominator for the hazard ratio.
devtools::install_github('junkka/ehahelper')
library(ehahelper); library(coxme); library(tidyverse)
# Create dataset
dat <- tibble(time = sample(seq(21), 1000, replace = TRUE),
status = sample(c(0,1), 1000, replace = TRUE),
var1 = sample(seq(100), 1000, replace = TRUE),
var2 = sample(seq(100), 1000, replace = TRUE),
group_id = seq(1, 1000, by = 1))
# Run model
fit2 <- coxme(Surv(time, status) ~ var1 + ns(var2, df = 3) + (1 | group_id), data = dat)
# Create dummy dataset
spline_values <- tibble(var1 = seq(1, 1000, by = 1))
newdata <- dat %>%
head(n = 1) %>%
select(time, status, var1, var2, group_id) %>%
slice(rep(1:n(), each = nrow(spline_values)))
newdata <- newdata %>% mutate(var1 = spline_values$var1)
# Ehahelper predict_coxme has difficulty with splines in a new dataset.
# To get around this, append the new dataset to the old one, apply the predict_coxme function, then subset the predicted values of interest
dat <- bind_rows(dat, newdata)
pred <- ehahelper::predict_coxme(fit2, newdata = dat, se.fit = TRUE, type = "risk", strata_ref = FALSE)
dat <- dat %>% add_column(rr = pred$fit[,1], se.fit = pred$se.fit[,1])
# Subset values of interest
dat <- dat %>% tail(nrow(newdata))
dat <- dat %>% mutate(
low = exp(log(rr) - 1.96 * se.fit),
high = exp(log(rr) + 1.96 * se.fit))
ggplot(dat, aes(var1)) +
geom_line(aes(y = rr)) +
geom_ribbon(aes(ymin = low, ymax = high), alpha = 0.1, linetype = 0)