Search code examples
rdata.tablesurvival

How to include a Surv object as a new column in a data.table?


I'm performing survival analysis, and I want to create a Surv object as its own column in a data.table. Although Surv objects are considered vectors, I can't use them to make new column since they are actually a 2 column matrix. Is there an elegant way to include Surv objects without splitting them into separate columns?

This is what a Surv object looks like.

DT[,Surv(time, status)]
#>  [1]   9   13   13+  18   23   28+  31   34   45+  48  161+   5    5    8 
#> [15]   8   12   16+  23   27   30   33   43   45

Here is an example of what I want to do:

library(data.table)
library(survival)

DF <- as.data.frame(survival::aml)
DT <- as.data.table(survival::aml)

# Does work
DF$survival <- Surv(DF$time, DF$status)

# Does not work
DT[,survival:=Surv(time, status)]

Solution

  • It's not yet clear what the underlying plan is for such a construction, but if the hope is to do survival modeling inside the data.table environment then separate construction of a Surv-object is not necessary. One should get comfortable with putting in complete expressions in the data.table j-position:

    > DT[ , coxph( Surv(time, status) ~ 1, data=.SD) ]
    Call:  coxph(formula = Surv(time, status) ~ 1, data = .SD)
    
    Null model
      log likelihood= -42.72484 
      n= 23 
    

    The data.table function creates an environment where column names get evaluated without quotes:

    > DT[ , summary(coxph( Surv(time, status) ~ x), data=.SD) ]
    Call:
    coxph(formula = Surv(time, status) ~ x)
    
      n= 23, number of events= 18 
    
                     coef exp(coef) se(coef)     z Pr(>|z|)  
    xNonmaintained 0.9155    2.4981   0.5119 1.788   0.0737 .
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
                   exp(coef) exp(-coef) lower .95 upper .95
    xNonmaintained     2.498     0.4003    0.9159     6.813
    
    Concordance= 0.619  (se = 0.063 )
    Likelihood ratio test= 3.38  on 1 df,   p=0.07
    Wald test            = 3.2  on 1 df,   p=0.07
    Score (logrank) test = 3.42  on 1 df,   p=0.06
    

    In fact the practice of separate construction of Surv-objects outside of the coxph function is something that brings questions to the rhelp mailing list because such outside makes an object whose environment is not the dataframe offered to coxph but is rather the globalenv(). Terry Therneau, the author of the survival package, warns people NOT to make separate Surv-objects. This is entirely separate from any issues regarding encapsulation of matrices in data.table, but hopefully it will reduce the level of frustration with this barrier.