I am using the pysurvival
library to model with the Cox proportional hazard model (CPH). Instead of getting the survival curves, I am interested in getting point predictions. In the library, the function predict_survival
returns an array-like representing the prediction of the survival function which I assume that I can use to get the expected values - but I just cant find the right way.
Below I've attached a dummy example.
# Initializing the simulation model
sim = SimulationModel( survival_distribution = 'log-logistic',
risk_type = 'linear',
censored_parameter = 10.1,
alpha = 0.1, beta=3.2 )
# Generating N random samples
N = 200
dataset = sim.generate_data(num_samples = N, num_features = 4)
# Defining the features
features = sim.features
# Creating the X, T and E input
X, T, E = dataset[features], dataset['time'].values, dataset['event'].values
# Building the model
coxph = CoxPHModel()
coxph.fit(X,T,E, lr=0.5, l2_reg=1e-2, init_method='zeros')
When applying the function:
coxph.predict_survival(x=X)
It returns an array of arrays with the shape (200, 87)
- why does it give exactly 87 values for every observation?
As far as I understand I should be able to get the expected value by taking the integral below the curve of the survival curve.
To do this I need to calculate the area under the curve which I think can be done using trapz
in the numpy
library, but I need to know how the spacing between the points are done.
As mentioned we can use the function predict_survival
to get the estimated survival probability. Furthermore, by calling coxPH.times
we obtain the time of every estimated survival probability, and thereby for example can plot the individual survival curve for every observation and calculate the area under the curve. By using the auc
function from the sklearn.metrics
library the following definition gives point predictions for the training and test set given a CPH model, the X_test
data and X_train
data:
## Get point predictions
def point_pred(model, X_test, X_train):
T_pred = []
T_pred_train = []
# Get survival curves
cph_pred = model.predict_survival(X_test)
cph_pred_train = model.predict_survival(X_train)
# get times of survival prediction
time = model.times
# test
for i in range(0,len(cph_pred)):
T_pred.append(auc(time,cph_pred[i]))
# train
for i in range(0,len(cph_pred_train)):
T_pred_train.append(auc(time,cph_pred_train[i]))
return T_pred, T_pred_train