I am running a survival model with sampling weights and was hoping to get the weighted logrank test results. When I use the svylogrank()
function from the survey package, the results were puzzling. If I run the default svylogrank test, the returned chisq value seemed way too high, and when I run the function with the method set to "score", the results were much more reasonable and as expected. Based on my understanding of the parameter method
, it should only matter for performance and should not affect the results, and I believe my model matrix is also relatively small.
Could someone help advise on this issue?
D = data.table(unique_id = 1:135,
weights = rep(1,135),
event_time = c(0.53512437, 1.35655869, 2.00414189, 2.37276648, 3.20343526, 0.96618494, 2.57894309, 0.94575080, 1.25347833, 1.44416450, 5.04038200, 7.80587169 ,
6.53631154, 6.31914568, 7.00146597, 9.67616088, 7.94212358, 9.70693890, 10.67575835, 10.06764688, 12.29175616, 13.60092871, 13.12508566, 14.66417522,
15.35250691, 0.93368707, 0.19087611, 3.15533767, 4.40821633, 17.54334957, 17.95177642, 15.50903946, 16.48376185, 20.87956697, 21.24571398, 22.34297263,
23.36042629, 21.01760215, 23.84785038, 26.06105822, 4.16866350, 1.96922485, 0.66199008, 6.76987830, 1.55617685, 0.19095871, 3.13291784, 5.43159409,
9.55805671, 4.31437322, 0.78259860, 5.26415156, 3.45095686, 1.69128712, 8.41942426, 3.33748695, 6.08516173, 2.72897404, 0.22789783, 0.86348009,
2.35707587, 2.97477615, 12.33273800, 0.58532123, 0.14586238, 10.67948547, 4.07655972, 3.94405136, 0.37226898, 1.42558725, 1.47680658, 4.22506540,
1.56703478, 8.37484756, 12.54015087, 1.80994787, 3.66453633, 1.02834532, 1.99065652, 1.23577436, 16.21981618, 14.35039798, 4.15321606, 2.79740679,
0.35538726, 7.46823358, 1.66329088, 7.46525382, 2.62734831, 3.19057957, 0.33317193, 0.09122886, 9.14616245, 2.48542578, 2.37569263, 5.48499630,
2.22749399, 2.64816296, 0.97101545, 1.42468625, 1.27668904, 0.03692447, 1.98783210, 5.47692729, 3.88316178, 0.32921277, 1.77225345, 9.33268901,
2.44517775, 1.49813702, 2.56059172, 3.43194832, 1.22955630, 3.56263947, 9.07060099, 3.58312362, 2.22755370, 4.24783776, 3.46364804, 1.61671354,
11.10973565, 7.18764270, 1.80400046, 6.39833474, 6.72825192, 6.46063344, 5.76855531, 5.27157807, 4.66154734, 3.50019718, 2.27156678, 3.28531594,
2.35699896, 2.94956000, 8.85381736),
event_flag = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1,
1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1,
1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0),
group = c(rep("group1", 40), rep("group2", 95)))
svykm_formula = "Surv(event_time, event_flag) ~ group" %>% as.formula
svy_design = svydesign( ids = D[ , unique_id ], weights = D[ , weights], data = D )
svylogrank(formula = svykm_formula, design = svy_design)
svylogrank(formula = svykm_formula, design = svy_design, method = "score")
Results:
> svylogrank(formula = svykm_formula, design = svy_design)
[[1]]
score
[1,] -21.31458 3.99272 -5.338361 9.379041e-08
[[2]]
chisq p
2.849810e+01 9.379041e-08
attr(,"class")
[1] "svylogrank"
> svylogrank(formula = svykm_formula, design = svy_design, method = "score")
z.groupgroup2 Chisq p
1.85490656 3.44067835 0.06360957
attr(,"class")
[1] "svylogrank"
Yes, it's a bug (which will be fixed in the next version). Both method="large"
and method="score"
are correct, but the default method="small"
isn't.
The problem is that survival::coxph.detail
doesn't do quite what is documented for the at-risk matrix (or I've misunderstood it). However, if the observations are in the right order you can't tell. I originally tested this with a data set where it accidentally gives the right answer.
Update: this uses the original data. The method="large"
and method="score"
agree; the default method="small"
also agrees if you sort the data so that the times are in ascending order (which won't be necessary for future releases)
> ii<-with(D, order(event_time, event_flag))
>
> svy_design2 = svydesign( ids = ~unique_id , weights = ~weights, data = D[ii,] )
>
> svylogrank(formula = svykm_formula, design = svy_design2)
[[1]]
score
[1,] 6.994881 3.771016 1.854907 0.06360957
[[2]]
chisq p
3.44067835 0.06360957
attr(,"class")
[1] "svylogrank"
> svylogrank(formula = svykm_formula, design = svy_design, method = "large")
[[1]]
score se z p
1 6.994881 3.771016 1.854907 0.06360957
[[2]]
chisq p
3.44067835 0.06360957
attr(,"class")
[1] "svylogrank"
>
> svylogrank(formula = svykm_formula, design = svy_design, method = "score")
z.groupgroup2 Chisq p
1.85490656 3.44067835 0.06360957
attr(,"class")
[1] "svylogrank"