Search code examples
rsurvival-analysis

Survival analysis on left truncated data with ipredbagg or pecCforest


I'm looking for ways to uses tree-like algoritms to perform a survival analysis on left-truncated, right censored data. I tried the packages ipred and pec, but the functions ipredbagg and pecCforest seem only to work without left truncation.

Data description

My data looks a lot like the heart dataset from the Stanford Heart Transplant data. The objects are in reality at risk from t=0, but some objects (for me the huge majority) enter the survey only at a later time t1, so when they would have died at t < t1 they would not have entered the dataset. It has been shown that when this left truncation is neglected, huge errors can arise, see e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3121224/ Besides the left truncation my dataset is also right sensored, just as the heart dataset, so some objects leave the dataset without having died/shown the relevant event.

The heart dataset looks like this

Surv(heart$start, heart$stop, heart$event)

[1] (  0.0,  50.0]  (  0.0,   6.0]  (  0.0,   1.0+] (  1.0,  16.0]  (  0.0,  36.0+] ( 36.0,  39.0] 
[7] (  0.0,  18.0]  (  0.0,   3.0]  (  0.0,  51.0+] ( 51.0, 675.0]  (  0.0,  40.0]  (  0.0,  85.0] 
[13] (  0.0,  12.0+] ( 12.0,  58.0]  (  0.0,  26.0+] ( 26.0, 153.0]  (  0.0,   8.0]  (  0.0,  17.0+]
[19] ( 17.0,  81.0]  (  0.0,  37.0+] ( 37.0,1387.0]  (  0.0,   1.0]  (  0.0,  28.0+] ( 28.0, 308.0] 
[25] (  0.0,  36.0]  (  0.0,  20.0+] ( 20.0,  43.0]  (  0.0,  37.0]  (  0.0,  18.0+] ( 18.0,  28.0] 
[31] (  0.0,   8.0+] (  8.0,1032.0]  (  0.0,  12.0+] ( 12.0,  51.0]  (  0.0,   3.0+] (  3.0, 733.0] 
[37] (  0.0,  83.0+] ( 83.0, 219.0]  (  0.0,  25.0+] ( 25.0,1800.0+] (  0.0,1401.0+] (  0.0, 263.0] 
[43] (  0.0,  71.0+] ( 71.0,  72.0]  (  0.0,  35.0]  (  0.0,  16.0+] ( 16.0, 852.0]  (  0.0,  16.0] 
[49] (  0.0,  17.0+] ( 17.0,  77.0]  (  0.0,  51.0+] ( 51.0,1587.0+] (  0.0,  23.0+] ( 23.0,1572.0+]
[55] (  0.0,  12.0]  (  0.0,  46.0+] ( 46.0, 100.0]  (  0.0,  19.0+] ( 19.0,  66.0]  (  0.0,   4.5+]
[61] (  4.5,   5.0]  (  0.0,   2.0+] (  2.0,  53.0]  (  0.0,  41.0+] ( 41.0,1408.0+] (  0.0,  58.0+]
[67] ( 58.0,1322.0+] (  0.0,   3.0]  (  0.0,   2.0]  (  0.0,  40.0]  (  0.0,   1.0+] (  1.0,  45.0] 
[73] (  0.0,   2.0+] (  2.0, 996.0]  (  0.0,  21.0+] ( 21.0,  72.0]  (  0.0,   9.0]  (  0.0,  36.0+]
[79] ( 36.0,1142.0+] (  0.0,  83.0+] ( 83.0, 980.0]  (  0.0,  32.0+] ( 32.0, 285.0]  (  0.0, 102.0] 
[85] (  0.0,  41.0+] ( 41.0, 188.0]  (  0.0,   3.0]  (  0.0,  10.0+] ( 10.0,  61.0]  (  0.0,  67.0+]
[91] ( 67.0, 942.0+] (  0.0, 149.0]  (  0.0,  21.0+] ( 21.0, 343.0]  (  0.0,  78.0+] ( 78.0, 916.0+]
[97] (  0.0,   3.0+] (  3.0,  68.0]  (  0.0,   2.0]  (  0.0,  69.0]  (  0.0,  27.0+] ( 27.0, 842.0+]
[103] (  0.0,  33.0+] ( 33.0, 584.0]  (  0.0,  12.0+] ( 12.0,  78.0]  (  0.0,  32.0]  (  0.0,  57.0+]
[109] ( 57.0, 285.0]  (  0.0,   3.0+] (  3.0,  68.0]  (  0.0,  10.0+] ( 10.0, 670.0+] (  0.0,   5.0+]
[115] (  5.0,  30.0]  (  0.0,  31.0+] ( 31.0, 620.0+] (  0.0,   4.0+] (  4.0, 596.0+] (  0.0,  27.0+]
[121] ( 27.0,  90.0]  (  0.0,   5.0+] (  5.0,  17.0]  (  0.0,   2.0]  (  0.0,  46.0+] ( 46.0, 545.0+]
[127] (  0.0,  21.0]  (  0.0, 210.0+] (210.0, 515.0+] (  0.0,  67.0+] ( 67.0,  96.0]  (  0.0,  26.0+]
[133] ( 26.0, 482.0+] (  0.0,   6.0+] (  6.0, 445.0+] (  0.0, 428.0+] (  0.0,  32.0+] ( 32.0,  80.0] 
[139] (  0.0,  37.0+] ( 37.0, 334.0]  (  0.0,   5.0]  (  0.0,   8.0+] (  8.0, 397.0+] (  0.0,  60.0+]
[145] ( 60.0, 110.0]  (  0.0,  31.0+] ( 31.0, 370.0+] (  0.0, 139.0+] (139.0, 207.0]  (  0.0, 160.0+]
[151] (160.0, 186.0]  (  0.0, 340.0]  (  0.0, 310.0+] (310.0, 340.0+] (  0.0,  28.0+] ( 28.0, 265.0+]
[157] (  0.0,   4.0+] (  4.0, 165.0]  (  0.0,   2.0+] (  2.0,  16.0]  (  0.0,  13.0+] ( 13.0, 180.0+]
[163] (  0.0,  21.0+] ( 21.0, 131.0+] (  0.0,  96.0+] ( 96.0, 109.0+] (  0.0,  21.0]  (  0.0,  38.0+]
[169] ( 38.0,  39.0+] (  0.0,  31.0+] (  0.0,  11.0+] (  0.0,   6.0] 

So at the first time in each interval the object enters my set and starts to be "at risk". At the second time the object leaves the set, some because the interesting event happens (without a '+') and others are censored (with '+').

Cox regression

For the Cox regression everything works fine. This Surv-object created above can be used to perform a Cox regression.

coxtime=coxph(Surv(heart$start, heart$stop, heart$event)~1,data=heart)
summary(coxtime)
Call:  coxph(formula = Surv(heart$start, heart$stop, heart$event) ~ 
1, data = heart)

Null model
  log likelihood= -298.1214 
  n= 172

I can also plot the survival function

plot(survfit(coxtime),xscale=365.25, xlab = "Years", ylab="Survival") 

Survival function of heart dataset Now I want to perform the same analysis with a tree-like algorithm.

ipredbagg

When I try the ippredbag-function, this function works fine without the left truncation:

library(survival)
library(ipred)
#without left truncation
ipredbagg(Surv(heart$stop, heart$event) ,X=heart$surgery)

I get the result

Bagging survival trees with 25 bootstrap replications

Because there are rows in the heart set where the start value is 0, I get an error when I simply enter the start and stop into the ipredbagg function.

#with left truncation
ipredbagg(Surv(heart$start, heart$stop, heart$event) ,X=heart$surgery)

Error in get(paste("rpart", method, sep = "."), envir = environment())(Y,  : 
Observation time must be > 0

Therefore I add one to both the start and stop columns, but now I get another error.

#with left truncation and start > 0
ipredbagg(Surv(heart$start+1, heart$stop+1, heart$event) ,X=heart$surgery)

The error:

Error in table(index2, levels = 1:ngrp) : 
  all arguments must have the same length

pecCforest

My second try is the pecCforest-function from the pec-package. This function also works on the data without left truncation. library(pec) library(party) library(survival)

#without left truncation
fitcforest <- pecCforest(Surv(stop, event) ~ .,data=heart[,-which(names(heart)=="start")],
                         controls = cforest_classical(ntree=100),mtry=2);
predictSurvProb(fitcforest,heart[1,],times=1)

I get the result

          [,1]
[1,] 0.9890595

This time I can train the model on the start + stop columns without an error, but I'm unable to get a prediction out of it.

#with left truncation
fitcforest <- pecCforest(Surv((start), (stop), event) ~ .,data=heart,
                         controls = cforest_classical(ntree=100));
predictSurvProb(fitcforest,heart[1,],times=1)

This results in

Error in predict.survfit(object, newdata = newdata, times = times, bytimes = TRUE,  : 
Predictions only available 
for class 'survfit', possibly stratified Kaplan-Meier fits.
For class 'cph' Cox models see survest.cph.

Adding 1 to both columns start and stop results in the same error.

#with left truncation and start > 0
fitcforest <- pecCforest(Surv((start+1), (stop+1), event) ~ .,data=heart,
                         controls = cforest_classical(ntree=100,mtry=2));
predictSurvProb(fitcforest,heart[1,],times=1)


Error in predict.survfit(object, newdata = newdata, times = times, bytimes = TRUE,  : 
  Predictions only available 
for class 'survfit', possibly stratified Kaplan-Meier fits.
 For class 'cph' Cox models see survest.cph.

Is there a way to get these functions work on left-truncated data? It seems that left truncation is not implemented in both functions, but I cannot find information about that. Is there an alternative way to do survivalanalysis on left truncated data in R with an algoritm based on trees (I managed to do a standard Cox regression)?


Solution

  • You can try recent package LTRCtrees on CRAN, which is designed for left-truncated survival data