Search code examples
rsurvival-analysiscox-regression

Time to event analysis in R to explore the effect of a biomarker on an event risk


In R, I analyse time-to-event data to explore the effect of a biomarker on an event risk, adjusting for gender. To do this, I work with data that looks like this toy dataset:

> head(data)
  pt sex age_baseline t_event death  t_stop biomarker
1  1   M         24345    3632     0   3981      0.22
2  2   F         25951    1121     0   3693      0.14
3  3   F         26900      NA     0   4437      0.04
4  4   F         27521    4896     1   5420      0.35
5  5   F         25660      NA     0   4035      0.25

Description of columns:

  • pt: individual's ID
  • sex: gender (M=male, F=female)
  • age_baseline: the age of the individual (in days) at the start of the study
  • t_stop: time (in days) between baseline and death / last news
  • death: death (0=no, 1=yes)
  • t_event: time (in days) between baseline and first occurrence of the event (NA = no event)
  • biomarker: level of biomarker at baseline

For clarity, an individual may experience event x, and later die of an unrelated cause.

To do so, I'd like to use the coxph() R function to compute the Cox model.

I thought of the following code:

coxph(
  Surv(time = age_baseline, time2 = t_stop, event = death) ~ t_event + sex, 
       data = data.coro)

Does this make sense for my original research question? How can I include biomarker?

Also, does the Cox model handle missing data (NAs in t_event)?


Solution

  • When you want to use a Cox PH model, your data set should contain an event time and information whether the event happened or whether there is censoring:

    data <- data.frame(pt = seq(1,5), sex = c("M",rep("F",4)), age_baseline = c(24345,25951,26900,27521,25660), 
                           t_event = c(3632,1121,NA,4896,NA), death = c(0,0,0,1,0), t_stop = c(3981,3693,4437,5420,4035), 
                           biomarker = c(0.22,0.14,0.04,0.35,0.25))
    
    
    # If there is no event, the last follow-up is the censor date
    data$t_event[which(is.na(data$t_event))] <- data$t_stop[which(is.na(data$t_event))]
    # Adding an event description. If no event, that means the patient gets censored
    data$event <- ifelse(is.na(data$t_event),0,1)
    

    Now that you have both the event time and event information (note that above also handles your NA), you can build your Cox model with both sex and biomarker as covariate:

    coxph(Surv(t_event, event) ~ sex + biomarker, data = data)
    

    This gives:

    > coxph(Surv(t_event, event) ~ sex + biomarker, data = data)
    Call:
    coxph(formula = Surv(t_event, event) ~ sex + biomarker, data = data)
    
                  coef exp(coef) se(coef)      z     p
    sexM       1.38086   3.97831  1.44998  0.952 0.341
    biomarker -3.34641   0.03521  4.48310 -0.746 0.455
    
    Likelihood ratio test=1.32  on 2 df, p=0.5167
    n= 5, number of events= 5