Search code examples
rsurveysurvival

Difference in KM estimation: svykm vs survfit with weights


I am estimating survival with weights in a case control study. So all cases have weight equal to 1. When drauwing the estimated curves and comparing them to unweighted estimation I noticed that KM curves for cases dont't overlap.

Here's the code for data from "survival" package.

library(dplyr)
library(tidyverse)
library(survival)
library(broom)
library(WeightIt)
library(survey)


a <- survival::ovarian

#Calculation of weghts:

weights <- WeightIt::weightit(rx ~ age + ecog.ps + resid.ds, int = T, estimand = "ATT", data = a, method = "glm" , stabilize = F, missing = "saem") 
a$weights <- weights$weights
a$ps <- weights$ps

design <- svydesign(ids = ~ 1, data = a, weights = ~weights)

KM_PFS <- survfit(Surv(futime, fustat > 0)~rx, a) # KM naive 
KM_PFS_w_TT <- survfit(Surv(futime, fustat > 0)~rx, a, weights = weights, robust = T)
KM_PFS_w <- svykm(Surv(futime, fustat > 0)~rx, design = design,se=T)


par(mfrow=c(1,1))
plot(KM_PFS_w[[2]], lwd=2, col=c("red"),xlab="Time (months)",ylab="PFS",#svykm treated
     xaxt="n", ci=F)
#lines(KM_PFS_w[[1]],col=c("blue"),lwd=2)
lines(KM_PFS,col=c("black","black"),lwd=2,lty=c(0,2)) #km naive treated
lines(KM_PFS_w_TT,col=c("orange","violet"),lwd=2,lty=c(0,1))#km TT treated

cas_km_w_TT <- tidy(KM_PFS_w_TT)%>%filter(strata == "rx=1")
cas_km <- tidy(KM_PFS)%>%filter(strata == "rx=1")
cas_km_w <- do.call("rbind", lapply(names(KM_PFS_w), \(x) {
  data.frame(strata = x, do.call("cbind", KM_PFS_w[[x]]))
})) %>% filter(strata ==1)

Solution

  • This has to do with the calculation of svykm(..., se=TRUE) not being the same as the one without -- if you don't ask for standard errors there the curves will match. As the documentation of svykm explains:

    When standard errors are computed, the survival curve is actually the Aalen (hazard-based) estimator rather than the Kaplan-Meier estimator.

    Also note that these packages do not use the same "weights", here is the author of survey explaining the difference (but this will only matter for variance estimation).