Search code examples
rsurvival-analysiscox-regression

predict.coxph() and survC1::Est.Cval -- type for predict() output


Given a coxph() model, I want to use predict() to predict hazards and then use survC1::Est.Cval( . . . nofit=TRUE) to get a c-value for the model.

The Est.Cval() documentation is rather terse, but says that "nofit=TRUE: If TRUE, the 3rd column of mydata is used as the risk score directly in calculation of C."

Say, for simplicity, that I want to predict on the same data I built the model on. For

  • coxModel a Cox regression model from coxph();
  • time a vector of times (positive reals), the same times that coxModel was built on; and
  • event a 0/1 vector, the same length, of event/censor indicators, the same events that coxModel was built on --

does this indicate that I want

predictions <- predict(coxModel, type="risk")

dd <- cbind(time, event, pred)

Est.Cval(mydata=dd, tau=tau, nofit=TRUE)

or should that first line be

predictions <- predict(coxModel, type="lp")

?

Thanks for any help,


Solution

  • The answer is that it doesn't matter.

    Basically, the concordance value is testing, for all comparable pairs of times (events and censors), how probable it is that the later time has the lower risk (for a really good model, almost always). But since e^u is a monotonic function of real u, and the c-value is only testing comparisons, it doesn't matter whether you provide the hazard ratio, e^(sum{\beta_i x_i}), or the linear predictor, sum{\beta_i x_i}.

    Since @42 motivated me to come up with a minimal working example, we can test this. We'll compare the values that Est.Cval() provides using one input versus using the other; and we can compare both to the value we get from coxph().

    (That last value won't match exactly, because Est.Cval() uses the method of Uno et al. 2011 (Uno, H., Cai, T., Pencina, M. J., D’Agostino, R. B. & Wei, L. J. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statist. Med. 30, 1105–1117 (2011), https://onlinelibrary.wiley.com/doi/full/10.1002/sim.4154) but it can serve as a sanity check, since the values should be close.)

    The following is based on the example worked through in Survival Analysis with R, 2017-09-25, by Joseph Rickert, https://rviews.rstudio.com/2017/09/25/survival-analysis-with-r/.

    library("survival")
    library("survC1")
    
    # Load dataset included with survival package
    data("veteran")
    # The variable `time` records survival time; `status` indicates whether the 
    # patient’s death was observed (status=1) or that survival time was censored 
    # (status = 0). 
    
    # The model they build in the example:
    coxModel <- coxph(Surv(time, status) ~ trt + celltype + karno + diagtime + 
        age + prior, data=veteran)
    
    # The results
    summary(coxModel)
    

    Note the c-score it gives us:

    Concordance= 0.736  (se = 0.021 )
    

    Now, we calculate the c-score given by Est.Cval() on the two types of values:

    # The value from Est.Cval(), using a risk input
    cvalByRisk <- Est.Cval(mydata=cbind(time=veteran$time, event=veteran$status, 
      predictions=predict(object=coxModel, newdata=veteran, type="risk")), 
      tau=2000, nofit=TRUE)
    
    # The value from Est.Cval(), using a linear predictor input
    cvalByLp <- Est.Cval(mydata=cbind(time=veteran$time, event=veteran$status, 
      predictions=predict(object=coxModel, newdata=veteran, type="lp")), 
      tau=2000, nofit=TRUE)
    

    And we compare the results:

    cvalByRisk$Dhat
    [1] 0.7282348
    cvalByLp$Dhat
    [1] 0.7282348