Search code examples
rsurvival

survfit outputs NA upper bounds but median survival is reached


The following data reach the median survival of 50%, but the upper bounds are NA. I am unable to figure out why the upper bounds are NA if I have valid survival histories.

survival plot

.data = structure(list(event_occurred = c(TRUE, FALSE, TRUE, FALSE, TRUE, 
TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, 
FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, 
FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, 
FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, 
TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, 
FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, 
TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, 
TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, 
FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, 
FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, 
FALSE, FALSE, FALSE), fu_time = c(190, 961, 483, 112, 56, 248, 
127, 366, 80, 105, 158, 127, 1464, 217, 1700, 530, 1363, 1360, 
963, 557, 668, 813, 504, 422, 147, 989, 872, 828, 419, 1071, 
1155, 1791, 368, 667, 147, 1643, 1012, 874, 941, 52, 379, 333, 
987, 30, 230, 166, 336, 764, 332, 4, 314, 1113, 1053, 1072, 91, 
1693, 766, 1362, 265, 646, 780, 164, 1070, 142, 1772, 313, 1925, 
118, 964, 988, 1980, 1172, 1358, 1694, 1751, 1754, 309, 1766, 
889, 390, 1601, 9, 800, 396, 249, 224, 363, 151, 1500, 1913, 
126, 280, 1734, 182, 220, 929, 330, 39, 56, 1077, 455, 924, 811, 
475, 43, 205, 504, 748), strat = c("A", "A", "A", "B", "B", "B", 
"B", "A", "B", "B", "B", "A", "B", "B", "B", "A", "A", "A", "B", 
"A", "B", "A", "A", "B", "A", "B", "A", "A", "A", "A", "A", "A", 
"A", "A", "B", "A", "A", "B", "A", "B", "A", "B", "A", "B", "A", 
"B", "B", "B", "A", "B", "A", "A", "A", "B", "B", "A", "A", "A", 
"A", "A", "A", "A", "B", "A", "A", "B", "B", "B", "B", "B", "A", 
"A", "A", "A", "A", "A", "B", "A", "A", "A", "A", "B", "A", "A", 
"B", "B", "B", "B", "A", "B", "B", "A", "A", "B", "B", "A", "B", 
"B", "B", "A", "A", "A", "A", "A", "B", "B", "B", "A")), row.names = c(NA, 
-108L), class = c("tbl_df", "tbl", "data.frame"))

survival::survfit(survival::Surv(time = .data$fu_time, event = .data$event_occurred) ~ strat,.data)

Solution

  • This is a non-parametric analysis. Most of the deaths have occurred before the estimated median times were reached and that supported the calculation of the of a 95% LCL. But very few events have occurred in either group after the median so the 95% UCL remains speculative. C.I.'s in non-parametric modeling depends crucially on the numbers of events supporting an estimate. (Not really a proper SO question. Should have been posted at stats.stackexchange.com )