In R, I analyse time-to-event data to explore the effect of a biomarker on an event risk, adjusting for gender. To do this, I work with data that looks like this toy dataset:
> head(data)
pt sex age_baseline t_event death t_stop biomarker
1 1 M 24345 3632 0 3981 0.22
2 2 F 25951 1121 0 3693 0.14
3 3 F 26900 NA 0 4437 0.04
4 4 F 27521 4896 1 5420 0.35
5 5 F 25660 NA 0 4035 0.25
Description of columns:
For clarity, an individual may experience event x, and later die of an unrelated cause.
To do so, I'd like to use the coxph()
R function to compute the Cox model.
I thought of the following code:
coxph(
Surv(time = age_baseline, time2 = t_stop, event = death) ~ t_event + sex,
data = data.coro)
Does this make sense for my original research question? How can I include biomarker
?
Also, does the Cox model handle missing data (NAs in t_event
)?
When you want to use a Cox PH model, your data set should contain an event time
and information whether the event
happened or whether there is censoring:
data <- data.frame(pt = seq(1,5), sex = c("M",rep("F",4)), age_baseline = c(24345,25951,26900,27521,25660),
t_event = c(3632,1121,NA,4896,NA), death = c(0,0,0,1,0), t_stop = c(3981,3693,4437,5420,4035),
biomarker = c(0.22,0.14,0.04,0.35,0.25))
# If there is no event, the last follow-up is the censor date
data$t_event[which(is.na(data$t_event))] <- data$t_stop[which(is.na(data$t_event))]
# Adding an event description. If no event, that means the patient gets censored
data$event <- ifelse(is.na(data$t_event),0,1)
Now that you have both the event time
and event
information (note that above also handles your NA
), you can build your Cox model with both sex
and biomarker
as covariate:
coxph(Surv(t_event, event) ~ sex + biomarker, data = data)
This gives:
> coxph(Surv(t_event, event) ~ sex + biomarker, data = data)
Call:
coxph(formula = Surv(t_event, event) ~ sex + biomarker, data = data)
coef exp(coef) se(coef) z p
sexM 1.38086 3.97831 1.44998 0.952 0.341
biomarker -3.34641 0.03521 4.48310 -0.746 0.455
Likelihood ratio test=1.32 on 2 df, p=0.5167
n= 5, number of events= 5