Search code examples
rsurvival-analysissurvival

Different Kaplan-Meier Results with Interval2


I've noticed a slight difference between survfit when I use a survival object of type "interval2". I first noticed that the number at risk on the interval2 fit was not an integer. I've stepped through survfit.formula and surviftKM but I'm still not clear as to exactly what's happening and why. Would anyone explain the difference to me? When debugging it appears survfitKM is using some .05 weights (casewt variable), but I'm not setting that explicitly.

MRE:

library('survival')
surv_obj <- with(lung, Surv(time = time, event = status == 1))

left <- lung$time
right <- ifelse(lung$status == 1, lung$time, NA) 
surv_obj_int <- Surv(time = left, time2 = right, type = 'interval2')

surv_fit <- survfit(surv_obj~1, type = 'kaplan-meier')
surv_fit_int <- survfit(surv_obj_int~1, type = 'kaplan-meier')


Solution

  • The repo owner was kind enough to explain.

    When there is interval censored data the code uses the Turnbull estimate (survfitTurnbull). In this case the number at risk is not well defined and the code uses a "working" value. In your particular example there are no interval censored observations, and it if twas smarter the code would notice that and use survfitKM: more accurate and a lot faster. But users don't tend to use interval2 style unless they need it.

    Although, I'm treating lung as interval censored, there are no actual intervals. If there is a right value, it always equals the left.