Search code examples
rsurvival-analysiscox-regressioncox

What is the default of the time transform function in the coxph command?


Short introduction:

A Cox Proportional Hazards (PH) model can be estimated with the coxph function of the survival package. An obvious requirement to get sensible results from this type of model is that the hazards are proportional, i.e., they are constant over time. If this is not the case for a certain variable, it can be solved by making the coefficient of this variable time varying. (Now it is technically an extended Cox model.) This is done by adding the tt() to that variable and specifying a function over time (see vignette("timedep", package = "survival") page 19+).

Question:

Which function is used if tt() is used without specifying a function?

Here is an example:

library(survival)
data(lung)
cox_model <- coxph(Surv(time, status) ~ age + sex + ph.karno, data = lung)
cox_model_ph <- cox.zph(cox_model) 
#              rho    chisq       p
# age      0.00701  0.00871 0.92566
# sex      0.12249  2.42336 0.11954
# ph.karno 0.23135  8.24167 0.00409
# GLOBAL        NA 11.54750 0.00911

We see that ph.karno violates the PH assumption (small p-value), so add tt():

cox_model_tt <- coxph(Surv(time, status) ~ age + sex + tt(ph.karno), data = lung)
cox_model_tt_ph <- cox.zph(cox_model_tt)
#                   rho  chisq      p
# age          -0.00907 0.0142 0.9052
# sex           0.12844 2.7270 0.0987
# tt(ph.karno)  0.11643 2.3846 0.1225
# GLOBAL             NA 5.0220 0.1702

Now the PH assumption is satisfied, but I have no idea what the tt() function actually did. I tried some common used function such as tt = function(x, t, ...) x*t, tt = function(x, t, ...) x + t, tt = function(x, t, ...) x*log(t). But all gave different results (and were unable to fix the PH violation).

Any help is appreciated.


Solution

  • Looking through the code for coxph I think if I have found it. You offered no value for the 'tt'-parameter, so I think this gets executed:

    if (is.null(tt)) {
                tt <- function(x, time, riskset, weights) {
                    obrien <- function(x) {
                      r <- rank(x)
                      (r - 0.5)/(0.5 + length(r) - r)
                    }
                    unlist(tapply(x, riskset, obrien))
                }
    

    And here's an experimental confirmation:

    > cox_model_OB <- coxph(Surv(time, status) ~ age + sex + tt(ph.karno), data = lung, tt=  function(x, time, riskset, weights) {
    +                 obrien <- function(x) {
    +                   r <- rank(x)
    +                   (r - 0.5)/(0.5 + length(r) - r)
    +                 }
    +                 unlist(tapply(x, riskset, obrien))
    +             }
    + )
    > ( cox_model_tt_ph <- cox.zph(cox_model_tt) )
                      rho  chisq      p
    age          -0.00907 0.0142 0.9052
    sex           0.12844 2.7270 0.0987
    tt(ph.karno)  0.11643 2.3846 0.1225
    GLOBAL             NA 5.0220 0.1702
    

    I'm wondering if this was intentional. I suspect it is code left in during a development session. I suspect that Therneau intends that failure to offer a 'tt'-function should throw at least a warning, but probably would have preferred an error. So that was a guess and I found that I was wrong by searching through the vignettes and find that it is intended: "This relies on the fact that the input arguments to tt() are ordered by the event number or risk set. This function is used as a default if no tt argument is present in the coxph call, but there are tt terms in the model formula. (Doing so allowed me to depreciate the survobrien function)." ref: page 23 of "Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model"from the current survival package Index help page links to Vignettes.