Search code examples
pythonsurvival-analysiscox-regressionsurvival

Making point predictions using Cox proportional hazard


I am using the pysurvival library to model with the Cox proportional hazard model (CPH). Instead of getting the survival curves, I am interested in getting point predictions. In the library, the function predict_survival returns an array-like representing the prediction of the survival function which I assume that I can use to get the expected values - but I just cant find the right way.

Below I've attached a dummy example.

# Initializing the simulation model
sim = SimulationModel( survival_distribution = 'log-logistic',
                       risk_type = 'linear',
                       censored_parameter = 10.1,
                       alpha = 0.1, beta=3.2 )
# Generating N random samples 
N = 200
dataset = sim.generate_data(num_samples = N, num_features = 4)
# Defining the features
features = sim.features
# Creating the X, T and E input
X, T, E = dataset[features], dataset['time'].values, dataset['event'].values
# Building the model
coxph = CoxPHModel()
coxph.fit(X,T,E, lr=0.5, l2_reg=1e-2, init_method='zeros')

When applying the function:

coxph.predict_survival(x=X)

It returns an array of arrays with the shape (200, 87) - why does it give exactly 87 values for every observation?

As far as I understand I should be able to get the expected value by taking the integral below the curve of the survival curve.

To do this I need to calculate the area under the curve which I think can be done using trapz in the numpy library, but I need to know how the spacing between the points are done.


Solution

  • As mentioned we can use the function predict_survival to get the estimated survival probability. Furthermore, by calling coxPH.times we obtain the time of every estimated survival probability, and thereby for example can plot the individual survival curve for every observation and calculate the area under the curve. By using the auc function from the sklearn.metrics library the following definition gives point predictions for the training and test set given a CPH model, the X_test data and X_train data:

    ## Get point predictions
    def point_pred(model, X_test, X_train):
        T_pred = []
        T_pred_train = []
        # Get survival curves
        cph_pred = model.predict_survival(X_test)
        cph_pred_train = model.predict_survival(X_train)
        # get times of survival prediction
        time = model.times
        # test
        for i in range(0,len(cph_pred)):
            T_pred.append(auc(time,cph_pred[i]))
        # train
        for i in range(0,len(cph_pred_train)):
            T_pred_train.append(auc(time,cph_pred_train[i]))
        
        return T_pred, T_pred_train