Search code examples
rrandom-forestparty

cforest in package party returning Inf for all predictions


I am trying to use the cforest function in the R package party to analyse some right-censored survival data. Every time I use the predict function I get Inf for each value, which means that a concordance index cannot be generated.

My data can be downloaded here: https://www.dropbox.com/s/nt9s3p1rdafq465/test_data.csv?dl=0

Example:

library(party)
library(survival)

mydata <- read.csv(file="test_data.csv", header=TRUE, sep=",",row.names=NULL)    
train<-head(mydata, n=800)
test<-tail(mydata, n=37)

cif_result <- cforest(Surv(timeToEvent, status) ~ V1 + V2 + V3 + V4 + V5 + V6, 
                    data = train,
                    control=cforest_classical())

cforest_pred <- predict(object = cif_result, newdata = test) 
cforest_pred

837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 
Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf 
857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 
Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf 

Am I doing something wrong? Why does cforest only predict Inf on this data?


Solution

  • The predict() method for survival trees/forests in the party package returns the median survival time. As there are observed events for less than 20% of the observations, a finite median survival time cannot be computed. Hence it is Inf. As an example consider the full-sample fit:

    m <- survfit(Surv(timeToEvent, status) ~ 1, data = train)
    plot(m)
    

    survfit