As proportional hazards assumption is violated with my real data, I am using an AFT model, trying to calculate adjusted survival probabilities for study groups in interest. The example below is on kidney data and I tried to follow ciTools vignette.
library(tidyverse)
library(ciTools)
library(here)
library(survival)
library(survminer)
#data
kidney
Model
fit1 = survreg(Surv(time, censored) ~ sex + age + disease, data = kidney)
Call:
survreg(formula = Surv(time, censored) ~ sex + age + disease,
data = kidney)
Coefficients:
(Intercept) sexfemale age diseaseGN diseaseAN diseasePKD
8.41830937 -0.93959839 -0.01578812 -0.25274448 -0.38306425 -0.32830433
Scale= 1.642239
Loglik(model)= -122.1 Loglik(intercept only)= -122.7
Chisq= 1.33 on 5 degrees of freedom, p= 0.931
n= 76
Adding survival probabilities for both sexes for surviving at least 365 days
probs = ciTools::add_probs(kidney, fit1, q = 365,
name = c("prob", "lcb", "ucb"),
comparison = ">")
probs
Trying to plot one-year survival probabilities for both sexes, but there are multiple point estimates for geom_point?
It seems for me that these point estimates are given for each age value. Can I edit the prediction so that it is made for mean or median age?
probs %>% ggplot(aes(x = prob, y = sex)) +
ggtitle("1-year survival probability") +
xlim(c(0,1)) +
theme_bw() +
geom_point(aes(x = prob), alpha = 0.5, colour = "red")+
geom_linerange(aes(xmin = lcb, xmax = ucb), alpha = 0.5)
However, this approach seems to work with a simple model
fit2 = survreg(Surv(time, censored) ~ sex, data = kidney)
probs2 = ciTools::add_probs(kidney, fit2, q = 365,
name = c("prob", "lcb", "ucb"),
comparison = ">")
probs2 %>% ggplot(aes(x = prob, y = sex)) +
ggtitle("1-year survival probability") +
xlim(c(0,1)) +
theme_bw() +
geom_point(aes(x = prob), alpha = 0.5, colour = "red")+
geom_linerange(aes(xmin = lcb, xmax = ucb), alpha = 0.5)
Questions:
The way you have set up these models, the predictions are being returned for each individual in the kidney
data set, based on the covariate values included in the model.
In your first model, you have included sex + age + disease
so that you get a prediction for each combination of those 3 covariate values in your data set.
In the second model, you have only included sex
as a predictor, so you only get predictions based on sex.
Survival model prediction functions allow you to specify a set of covariate values from which to predict in a new data frame. According to the manual for add_probs.survreg
, you do so by specifying a new data frame with specified covariate values in the df
argument to the function. You used the kidney
data frame there, so you got predictions for all those cases.
I'm not familiar with ciTools::add_probs
specifically, but such software typically will (without warning) accept your values for the covariates you specify and then use some type of "average" value for the covariates that you don't specify. As "averages" don't have much meaning for categorical covariates like disease
, it's usually better to specify a complete useful set of values for all covariates yourself.
The functions in the rms package
in R often do a better job at providing useful predictions, as they choose typical rather than "average" values for unspecified covariates, based on an initial evaluation of the data set by a datadist()
function whose output you then specify as a system option. The learning curve for this package is a bit steep but well worth it if you will be doing a lot of survival or other regression modeling.