Search code examples
rstatisticsmediansurvival-analysissurvival

median VS survival median (from survfit)


This may seem like a silly question, but I was wondering why the median from median and the median from survfit ("survival package") are different

I tried to simulate the tutorial in sciencing.com:

List the survival time of all the subjects in your sample. For example, if you have five students (in a real study, you'd have more) and their times to graduation were 3 years, 4 years (so far), 4.5 years, 3.5 years and 7 years (so far), write down the times: 3, 4, 4.5, 3.5, 7.

Put a plus sign (or other mark) next to any times that are right-censored (that is, those that have not had the event happen yet). Your list would look like this: 3, 4+, 4.5, 3.5, 7+.

So I created a data.frame (T for dead and F for alive):

survive <- data.frame(OS = c(3,4,4.5,3.5,7), status = c(T,F,T,T,F))

the median is 4 as sciencing.com says:

median(survive$OS)
[1] 4

but when I do survival analysis with "survival package" I get this:

Call: survfit(formula = Surv(OS, status) ~ 1, data = survive)

      n  events  median 0.95LCL 0.95UCL 
    5.0     3.0     4.5     3.5      NA

So my question is why these two medians are different?

thanks


Solution

  • Remember that the times you have are not survival times - they are follow up times. Two of the individuals are right-censored, meaning that we do not know what happened to them after their follow up time.

    Suppose we plot your survival curve:

     plot(Surv(survive$OS, survive$status))
    

    enter image description here

    This plot shows us the proportion of survivors among the people we are actively following up. It drops in steps as people die, but if we lose someone to follow up, the estimated survival does not change at that point (why should the fact that we lose someone to follow up change our estimated survival at that point?)

    Now let's use this plot to answer the question "After how long does the observed survival rate fall below 50%?" We can draw a horizontal line at y = 0.5 and see at what value this line crosses the survival curve:

    abline(h = 0.5, lty = 2, col = "red")
    abline(v = 4.5, lty = 2, col = "red")
    

    enter image description here

    We can see that the estimated survival falls below 50% at 4.5 years, so this is the median survival time. Count the individuals at each point and we can see this:

    • Time = 0: We have 5 people in our sample, all of whom are alive (survival = 100%)
    • Time = 3: We have 5 people in our sample, one of whom has died (survival = 80%)
    • Time = 3.5: We have 5 people in our sample, two of whom have died (survival = 60%)
    • Time = 4: We have 4 people in our sample, since we lost one to follow up. The fact that we lost this person to follow up cannot affect the estimated survival at that point, so survival remains at 60%. Note that if the person had died instead of being lost to follow up, survival would have dropped to 40% and the median survival would indeed have been 4 years.
    • Time = 4.5: We have 4 people in our sample, and only one of them is alive (survival = 0.25)