r survival-analysis cox-regression survival

Accelerated Failure Time modelling: plotting survival probabilities with CI-s (example provided)

As proportional hazards assumption is violated with my real data, I am using an AFT model, trying to calculate adjusted survival probabilities for study groups in interest. The example below is on kidney data and I tried to follow ciTools vignette.

library(tidyverse)
library(ciTools)
library(here)
library(survival)
library(survminer)

#data
kidney

Model

fit1 =  survreg(Surv(time, censored) ~ sex + age + disease, data = kidney)


Call:
survreg(formula = Surv(time, censored) ~ sex + age + disease, 
    data = kidney)

Coefficients:
(Intercept)   sexfemale         age   diseaseGN   diseaseAN  diseasePKD 
 8.41830937 -0.93959839 -0.01578812 -0.25274448 -0.38306425 -0.32830433 

Scale= 1.642239 

Loglik(model)= -122.1   Loglik(intercept only)= -122.7
    Chisq= 1.33 on 5 degrees of freedom, p= 0.931 
n= 76

Adding survival probabilities for both sexes for surviving at least 365 days

probs = ciTools::add_probs(kidney, fit1, q = 365,
                        name = c("prob", "lcb", "ucb"),
                        comparison = ">")
probs

Trying to plot one-year survival probabilities for both sexes, but there are multiple point estimates for geom_point?

It seems for me that these point estimates are given for each age value. Can I edit the prediction so that it is made for mean or median age?

probs %>% ggplot(aes(x = prob, y = sex)) +
    ggtitle("1-year survival probability") +
    xlim(c(0,1)) +
    theme_bw() +
    geom_point(aes(x = prob), alpha = 0.5, colour = "red")+
    geom_linerange(aes(xmin = lcb, xmax = ucb), alpha = 0.5)

However, this approach seems to work with a simple model

fit2 =  survreg(Surv(time, censored) ~ sex, data = kidney)

probs2 = ciTools::add_probs(kidney, fit2, q = 365,
                            name = c("prob", "lcb", "ucb"),
                            comparison = ">")

probs2 %>% ggplot(aes(x = prob, y = sex)) +
    ggtitle("1-year survival probability") +
    xlim(c(0,1)) +
    theme_bw() +
    geom_point(aes(x = prob), alpha = 0.5, colour = "red")+
    geom_linerange(aes(xmin = lcb, xmax = ucb), alpha = 0.5)

Questions:

How can I get adjusted survival probabilities for both sexes? Or if this is impossible, what would be possible alternatives? Code would help with alternatives.
If I would like to get adjusted survival probabilities for both sexes and for different time points, should I edit "q" value in ciTools::add_probs() function? For example: q = 30 for one month; q = 90 for three months etc. Or I should run a separate model for each time period?

Solution

The way you have set up these models, the predictions are being returned for each individual in the kidney data set, based on the covariate values included in the model.

In your first model, you have included sex + age + disease so that you get a prediction for each combination of those 3 covariate values in your data set.

In the second model, you have only included sex as a predictor, so you only get predictions based on sex.

Survival model prediction functions allow you to specify a set of covariate values from which to predict in a new data frame. According to the manual for add_probs.survreg, you do so by specifying a new data frame with specified covariate values in the df argument to the function. You used the kidney data frame there, so you got predictions for all those cases.

I'm not familiar with ciTools::add_probs specifically, but such software typically will (without warning) accept your values for the covariates you specify and then use some type of "average" value for the covariates that you don't specify. As "averages" don't have much meaning for categorical covariates like disease, it's usually better to specify a complete useful set of values for all covariates yourself.

The functions in the rms package in R often do a better job at providing useful predictions, as they choose typical rather than "average" values for unspecified covariates, based on an initial evaluation of the data set by a datadist() function whose output you then specify as a system option. The learning curve for this package is a bit steep but well worth it if you will be doing a lot of survival or other regression modeling.