This may seem like a silly question, but I was wondering why the median from median
and the median from survfit
("survival package") are different
I tried to simulate the tutorial in sciencing.com:
List the survival time of all the subjects in your sample. For example, if you have five students (in a real study, you'd have more) and their times to graduation were 3 years, 4 years (so far), 4.5 years, 3.5 years and 7 years (so far), write down the times: 3, 4, 4.5, 3.5, 7.
Put a plus sign (or other mark) next to any times that are right-censored (that is, those that have not had the event happen yet). Your list would look like this: 3, 4+, 4.5, 3.5, 7+.
So I created a data.frame (T
for dead and F
for alive):
survive <- data.frame(OS = c(3,4,4.5,3.5,7), status = c(T,F,T,T,F))
the median is 4 as sciencing.com says:
median(survive$OS)
[1] 4
but when I do survival analysis with "survival package" I get this:
Call: survfit(formula = Surv(OS, status) ~ 1, data = survive)
n events median 0.95LCL 0.95UCL
5.0 3.0 4.5 3.5 NA
So my question is why these two medians are different?
thanks
Remember that the times you have are not survival times - they are follow up times. Two of the individuals are right-censored, meaning that we do not know what happened to them after their follow up time.
Suppose we plot your survival curve:
plot(Surv(survive$OS, survive$status))
This plot shows us the proportion of survivors among the people we are actively following up. It drops in steps as people die, but if we lose someone to follow up, the estimated survival does not change at that point (why should the fact that we lose someone to follow up change our estimated survival at that point?)
Now let's use this plot to answer the question "After how long does the observed survival rate fall below 50%?" We can draw a horizontal line at y = 0.5 and see at what value this line crosses the survival curve:
abline(h = 0.5, lty = 2, col = "red")
abline(v = 4.5, lty = 2, col = "red")
We can see that the estimated survival falls below 50% at 4.5 years, so this is the median survival time. Count the individuals at each point and we can see this: