Search code examples
rpredictionconfidence-intervalcox-regression

Confidence intervals for survival times using survfit function in R for multiple new data points


I think the solution to this is probably quite straight forward but I cannot figure it out.

I have fitted a cox proportional hazards model to a data set. I want to make absolute risk predictions for new data points, as would be done in a risk prediction model. For example, P(T>t) where T is the time until an event of interest.

Mathematically I understand this process. The cumulative hazard function must be estimated first, one example of a formula for doing this can be found here: https://stats.stackexchange.com/questions/46532/cox-baseline-hazard

The survival function is then a simple function of the cumulative hazard function, S(t)=exp⁡(−H(t)), formula can be found here: https://stats.stackexchange.com/questions/58046/proof-of-relationship-between-hazard-rate-probability-density-survival-functio

After fitting a cox proportional hazards model, called fit1, this can all be done using the survfit and summary functions in R, for a new data point, newdata:

newdata1 = data.frame(x1=0,x2=3,x3=5)

summary(survfit(fit1, newdata, type="aalen",se.fit = TRUE, conf.int = 0.95),times=50)

This gives P(T>50), and the following output:

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   50  14462    3216    0.835 0.00631        0.823        0.848

Crucially this gives an upper and lower CI for the survival probability. Now if I want to do this for multiple data points, so I define:

newdata1 = data.frame(x1=0,x2=3,x3=5)
newdata2 = data.frame(x1=1,x2=1,x3=2)

newdata=rbind(newdata1,newdata2)

summary(survfit(fit1, newdata, type="kalb",se.fit = TRUE, conf.int = 0.95),times=50)

I get the following output:

time n.risk n.event survival1 survival2
  50  14462    3216     0.835     0.822

It gives the survival probability for each patient, but not the associated confidence intervals. This is despite confidence intervals being requested by conf.int=0.95.

So the questions is: How can I get confidence intervals around the survival probabilities when getting predicted survival probabilities for more than one data point?


Solution

  • What you are seeing is not exactly the output of function, rather a summary of output where different behaviors are defined for different numbers of rows in the newdata. You can see that from the following example

    fit <- coxph(Surv(futime, fustat) ~ age, data=ovarian) 
    (CI_summary1 <- summary(survfit(fit, newdata=data.frame(age=c(60)), type="aalen",se.fit = TRUE, conf.int = 0.95),times=50))
    (CI_summary2 <- summary(survfit(fit, newdata=data.frame(age=c(60,70)), type="aalen",se.fit = TRUE, conf.int = 0.95),times=50))
    str(CI_summary2)
    

    The confidence intervals you are looking for are in CI_summary2$upper and CI_summary2$lower.