I would like to know if there is an easy way to estimate the (cumulative) baseline hazard from a cox model with time-varying coefficients over different time intervals. After creating the time splitted data with survSplit()
, the predict.coxph()
method with type='expected' gives expected values per row which I guess considers the effect of same subjects multiple times. Is there an easy way to obtain these estimates? Am I thinking correctly? Let's discuss it further through an example:
library(survival)
library(riskRegression)
data(Melanoma)
d <- survSplit(formula = Surv(time,status==1)~age+sex+epicel,
data = Melanoma,
cut = 1095,
episode = 'tgroup',
id = 'id')
fit <- coxph(Surv(tstart,time,event)~age:strata(tgroup)+sex+epicel,data=d,x=TRUE)
Now for example, for a patient who got censored at time=1700, if we want to estimate the baseline hazard at times 800 & 1500, the patient is in the risk set for both times but with different linear predictors (as the cut-point in time was set at time=1095). Looks like predict.coxph()
doesn't take this into account. Am I thinking correctly? Is there an adjustment to predict.coxph()
? Are there any other functions to do this automatically or do I need to start writing the function myself? I want to use these values to obtain absolute risk estimates of each patient. Thanks in advance.
I tried the above example as demonstration.
I managed to figure out a way to deal with it myself. survfit
works with time-splitted data for time-varying coefficient if newdata argument is shaped in a time-splitted data format as well. Here we shape the information of a patient with all variables at baseline to get values of cumulative baseline hazard.
library(tidyverse)
nd <- data.frame(tstart=c(0,1095),
time=c(1095,max(Melanoma$time)),
event=0,
tgroup=1:2,
id=1,
age=0,
sex='Female',
epicel='not present')
sfit <- survfit(fit,newdata=nd,id=id)
ggplot(data=tibble('time'=sfit$time,'Cumulative baseline hazard'=sfit$cumhaz)) +
geom_line(aes(x=time,y=`Cumulative baseline hazard`),linewidth=.75) + theme_light()
These values along with linear predictors can then be used to shape the absolute risk estimates. This vignette is a comprehensive and helpful description of time-varying covariates & coefficients. The response here is also really helpful.