Search code examples
rlistplotsurvival-analysissurvival

How to get the probability data from survfit function?


I plot cumulative incidence curves based on data from the survival::survfit() function. Following this vignette I do something like the following:

library(survival)
mgus2$etime <- with(mgus2, ifelse(pstat==0, futime, ptime))
event <- with(mgus2, ifelse(pstat==0, 2*death, 1))
mgus2$event <- factor(event, 0:2, labels=c("censor", "pcm", "death"))
mfit2 <- survfit(Surv(etime, event) ~ sex, data=mgus2)
plot(mfit2)

four groups

The data has two possible events (pcm and death) and the included covariate sex has two levels (male and female). Thus, as expected, the resulting plot depicts the the probabilities for four lines (I left out the labels because they don't matter here).

But how to extract this data? I looked everywhere inside mfit2 but can not find the data needed to create the plot. There is mfit2$time for the x axis, but mfit2$pstate does not include all four probabilities (pcm female, pcm male, death female and death male). Where are those probabilities hidden in mfit2?

I found this answer which suggests some workaround in order to get the data. I see that the workaround works, that's not the question. Where is the data for the plot inside mfit2? It must be somewhere since plot(mfit2) returns as the plot we would expect.


Solution

  • You're after the $pstate, which contains the concatenated probabilities for the two levels of sex (females are the first level).

    levels(mgus$sex)
    #[1] "female" "male" 
    

    The length of mfit2$pstate (and mfits$time) is 454, the first 227 corresponds to probabilities for females (red).

    mfit2$strata
    sex=F sex=M 
      227   227
    

    You can see that this is correct by plotting the fitted object and overlaying the lines based on the above information.

    plot(mfit2)
    lines(mfit2$time[1:227], mfit2$pstate[1:227, 2], type="s", col="red", lty=2)
    lines(mfit2$time[228:454], mfit2$pstate[228:454, 2], type="s", col="green", lty=2)
    lines(mfit2$time[1:227], mfit2$pstate[1:227, 3], type="s", col="red", lty=2)
    lines(mfit2$time[228:454], mfit2$pstate[228:454, 3], type="s", col="green", lty=2)
    

    enter image description here