Search code examples
rsurvival-analysis

(Cumulative) baseline hazard in cox models with time-dependent coefficients


I would like to know if there is an easy way to estimate the (cumulative) baseline hazard from a cox model with time-varying coefficients over different time intervals. After creating the time splitted data with survSplit(), the predict.coxph() method with type='expected' gives expected values per row which I guess considers the effect of same subjects multiple times. Is there an easy way to obtain these estimates? Am I thinking correctly? Let's discuss it further through an example:

library(survival)

library(riskRegression)

data(Melanoma)

d <- survSplit(formula = Surv(time,status==1)~age+sex+epicel,
               
               data = Melanoma,
               
               cut = 1095,
               
               episode = 'tgroup',
               
               id = 'id')

fit <- coxph(Surv(tstart,time,event)~age:strata(tgroup)+sex+epicel,data=d,x=TRUE)

Now for example, for a patient who got censored at time=1700, if we want to estimate the baseline hazard at times 800 & 1500, the patient is in the risk set for both times but with different linear predictors (as the cut-point in time was set at time=1095). Looks like predict.coxph() doesn't take this into account. Am I thinking correctly? Is there an adjustment to predict.coxph()? Are there any other functions to do this automatically or do I need to start writing the function myself? I want to use these values to obtain absolute risk estimates of each patient. Thanks in advance.

I tried the above example as demonstration.


Solution

  • I managed to figure out a way to deal with it myself. survfit works with time-splitted data for time-varying coefficient if newdata argument is shaped in a time-splitted data format as well. Here we shape the information of a patient with all variables at baseline to get values of cumulative baseline hazard.

    library(tidyverse)
    
    nd <- data.frame(tstart=c(0,1095),
                     time=c(1095,max(Melanoma$time)),
                     event=0,
                     tgroup=1:2,
                     id=1,
                     age=0,
                     sex='Female',
                     epicel='not present')
    
    sfit <- survfit(fit,newdata=nd,id=id)
    
    ggplot(data=tibble('time'=sfit$time,'Cumulative baseline hazard'=sfit$cumhaz)) + 
      geom_line(aes(x=time,y=`Cumulative baseline hazard`),linewidth=.75) + theme_light()
    

    These values along with linear predictors can then be used to shape the absolute risk estimates. This vignette is a comprehensive and helpful description of time-varying covariates & coefficients. The response here is also really helpful.