I have a cox model that has a time dependence aspect to it that was resolved by splitting the curve into 3 pieces. After doing so and confirming that the proportionality is restored I would like to make a prediction with the cox model. The survival package documentation shows this but only for non split data.
How can I transform it back into a more interpretable single plot where I can check treatment vs control etc...
Minimum Example
# Clean your workspace environment
# rm(list = ls())
# Clear the plots tab (if any plots are open)
# if (!is.null(dev.list())) dev.off()
# Clear the console (equivalent to Ctrl+L shortcut)
# cat("\014")
library("survival")
library("survminer")
vet2 <-survSplit(Surv(time, status) ~., data= veteran, cut=c(90, 180), episode= "tgroup", id="id")
vfit2 <-coxph(Surv(tstart, time, status) ~ trt + prior + karno:strata(tgroup),data=vet2)
scurve0 <- survfit(vfit2, newdata = vet2)
#plot(scurve0)
ggsurvplot(scurve0, data=vfit2,legend = "none" )
plotdata <- data.frame(
trt = c(0,0,0,0,0,0),
prior = c(0,10,0,10,0,10),
karno = c(60,99,60,99,60,99),
tgroup = c(1,2,3)
)
scurve1 <- survfit(vfit2, newdata = plotdata)
ggsurvplot(scurve0, data=vfit2,legend = "none" )
scurve1 <- survfit(vfit2, newdata = plotdata)
this produces the following plot for ** both ** plot calls.
How can I get this back as a single curve? I need to turn the legend off because the strata creates so many cases. When I try to plot the model with a new data I get the exact same plot so I do not think it is working
It doesn't look like there is a straightforward solution to this problem, e.g. as commented by @IRTFM:
"Terry Therneau says he doesn't know how to draw predicted survival curves when using models with time-varying coefficients. Search the r-help archives where this has been repeatedly discussed. This is what he said 2 years ago: "The "survival curve" for a time dependent covariate is something that is not easily defined. Read chapter 10.2.4 of the Therneau and Grambch book for a discussion of this (largely informed by the many mistakes I've myself made.)"
However, I think you could use the approach outlined by @EdM in https://stats.stackexchange.com/questions/534516/survival-curve-from-time-dependent-coefficients-in-the-cox-model as a potential workaround (based on the 'log(time+20)' approach described in https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf).
E.g. for your example data:
library(survival)
library(survminer)
vfit <- coxph(Surv(time, status) ~ ., data = veteran)
prop <- cox.zph(vfit)
prop
#> chisq df p
#> trt 0.2644 1 0.60712
#> celltype 15.2274 3 0.00163
#> karno 12.9352 1 0.00032
#> diagtime 0.0129 1 0.90961
#> age 1.8288 1 0.17627
#> prior 2.1656 1 0.14113
#> GLOBAL 34.5525 8 3.2e-05
plot(prop[3])
abline(h = vfit$coefficients[5], lwd = 2, lty = 2, col = "firebrick3")
vet2 <-survSplit(Surv(time, status) ~.,
data = veteran, cut = c(30, 60),
episode = "tgroup", id = "id")
vfit2 <-coxph(Surv(tstart, time, status) ~ trt + prior + karno:strata(tgroup),
data = vet2)
vfit2
#> Call:
#> coxph(formula = Surv(tstart, time, status) ~ trt + prior + karno:strata(tgroup),
#> data = vet2)
#>
#> coef exp(coef) se(coef) z p
#> trt -0.010158 0.989894 0.190075 -0.053 0.957381
#> prior -0.006735 0.993288 0.020252 -0.333 0.739479
#> karno:strata(tgroup)tgroup=1 -0.052164 0.949173 0.008086 -6.451 1.11e-10
#> karno:strata(tgroup)tgroup=2 -0.040127 0.960668 0.011541 -3.477 0.000507
#> karno:strata(tgroup)tgroup=3 -0.008322 0.991712 0.008725 -0.954 0.340150
#>
#> Likelihood ratio test=57.13 on 5 df, p=4.759e-11
#> n= 305, number of events= 128
prop2 <- cox.zph(vfit2)
prop2
#> chisq df p
#> trt 1.217 1 0.270
#> prior 3.898 1 0.048
#> karno:strata(tgroup) 0.235 3 0.972
#> GLOBAL 5.791 5 0.327
plot(prop2[3])
plot(survfit(vfit2, newdata = vet2))
## tt() "log(time + 20)" approach
dtimes <- sort(unique(veteran$time[veteran$status==1]))
vet3 <-survSplit(Surv(time, status) ~.,
data = veteran, cut = dtimes,
episode = "tgroup", id = "id")
vet3$time_var_karno <- vet3$karno * log(vet3$time+20)
vfit4 <- coxph(Surv(tstart,time,status) ~ trt + prior + karno + time_var_karno, data = vet3)
plotdata <- data.frame(
tstart = c(0, dtimes[1:96]),
time = dtimes,
trt = rep(0, 97),
prior = rep(c(0, 5, 10), length.out = 97),
karno = rep(30, 97),
tgroup = rep(c(1,2,3), length.out = 97),
status = rep(0, 97)
)
plotdata$time_var_karno <- plotdata$karno * log(plotdata$time + 20)
plotdata2 <- plotdata
plotdata2$karno <- 60
plotdata2$time_var_karno <- plotdata2$karno * log(plotdata2$time + 20)
plotdata3 <- plotdata
plotdata3$karno <- 90
plotdata3$time_var_karno <- plotdata3$karno * log(plotdata3$time + 20)
plotdata$id <- "1"
plotdata2$id <- "2"
plotdata3$id <- "3"
col_pal <- viridisLite::viridis(n = 4)
plot(survfit(vfit4, newdata = rbind(plotdata,
plotdata2,
plotdata3),
id = id, se.fit = FALSE),
bty = "n", xlab = "Days",
ylab = "Survival probability",
col = col_pal[1:3])
text(160,0.15,"karno = 30", col = col_pal[1])
text(200,0.4,"karno = 60", col = col_pal[2])
text(240,0.6,"karno = 90", col = col_pal[3])
# overlay
plot(survfit(vfit2, newdata = vet2))
lines(survfit(vfit4, newdata = rbind(plotdata,
plotdata2,
plotdata3),
id = id, se.fit = FALSE),
bty = "n", xlab = "Days",
ylab = "Survival probability",
col = col_pal[1:3],
lwd = 3)
Created on 2025-02-08 with reprex v2.1.1
NB: I'm honestly not sure whether this answer is 'correct' or not. It looks good, i.e. the time_var_karno-corrected lines provide a very close approximation of the original ggsurvplot, but if you're publishing this in a journal I recommend asking a stats professor to critique it beforehand. I also think it would be worth asking for advice at https://stats.stackexchange.com and see if any of the 'heavy hitters' are willing to discuss further.