Search code examples
pythonsurvival-analysisscikit-survival

How to use decision trees for survival analysis?


I have a problem understanding und applying Decision Trees for Survival Analysis in Python. I have a dataset, with the variables age, weight, size of tumor, volume, ... (all floats) and I want to know if there is a correlation with the overall survival(also a float).

But how can I apply Decision Trees for that? In the literature, I only saw examples where y_train must be a categorical variable (such as 0 or 1, benign or malignant, ...) but it does not work on continuous variables like floats.

However I want to create a decision tree, so that in the end you can find out that with a tumor size of > xx and a volume of >yy your predictes overall survival is about < zzz.

Can someone help me out with my problem? Does anyone has an idea where to read more about this topic?


Solution

  • The Scikit-survival package provides some ensemble decision tree models like RandomSurvivalForest and also classical models like the Cox model CoxPhSurvivalAnalysis.

    The docs provide a good code example. Regarding the target variable y, at least in this case the documentation states

    y – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.