Potential issue in svylogrank test results

I am running a survival model with sampling weights and was hoping to get the weighted logrank test results. When I use the svylogrank() function from the survey package, the results were puzzling. If I run the default svylogrank test, the returned chisq value seemed way too high, and when I run the function with the method set to "score", the results were much more reasonable and as expected. Based on my understanding of the parameter method, it should only matter for performance and should not affect the results, and I believe my model matrix is also relatively small.

Could someone help advise on this issue?

D = data.table(unique_id = 1:135,
               weights = rep(1,135),
               event_time = c(0.53512437, 1.35655869, 2.00414189, 2.37276648, 3.20343526, 0.96618494, 2.57894309, 0.94575080, 1.25347833, 1.44416450, 5.04038200, 7.80587169 ,
                        6.53631154, 6.31914568, 7.00146597, 9.67616088, 7.94212358, 9.70693890, 10.67575835, 10.06764688, 12.29175616, 13.60092871, 13.12508566, 14.66417522,
                        15.35250691, 0.93368707, 0.19087611, 3.15533767, 4.40821633, 17.54334957, 17.95177642, 15.50903946, 16.48376185, 20.87956697, 21.24571398, 22.34297263,
                        23.36042629, 21.01760215, 23.84785038, 26.06105822, 4.16866350, 1.96922485, 0.66199008, 6.76987830, 1.55617685, 0.19095871, 3.13291784, 5.43159409,
                        9.55805671, 4.31437322, 0.78259860, 5.26415156, 3.45095686, 1.69128712, 8.41942426, 3.33748695, 6.08516173, 2.72897404, 0.22789783, 0.86348009,
                        2.35707587, 2.97477615, 12.33273800, 0.58532123, 0.14586238, 10.67948547, 4.07655972, 3.94405136, 0.37226898, 1.42558725, 1.47680658, 4.22506540,
                        1.56703478, 8.37484756, 12.54015087, 1.80994787, 3.66453633, 1.02834532, 1.99065652, 1.23577436, 16.21981618, 14.35039798, 4.15321606, 2.79740679,
                        0.35538726, 7.46823358, 1.66329088, 7.46525382, 2.62734831, 3.19057957, 0.33317193, 0.09122886, 9.14616245, 2.48542578, 2.37569263, 5.48499630,
                        2.22749399, 2.64816296, 0.97101545, 1.42468625, 1.27668904, 0.03692447, 1.98783210, 5.47692729, 3.88316178, 0.32921277, 1.77225345, 9.33268901,
                        2.44517775, 1.49813702, 2.56059172, 3.43194832, 1.22955630, 3.56263947, 9.07060099, 3.58312362, 2.22755370, 4.24783776, 3.46364804, 1.61671354,
                        11.10973565, 7.18764270, 1.80400046, 6.39833474, 6.72825192, 6.46063344, 5.76855531, 5.27157807, 4.66154734, 3.50019718, 2.27156678, 3.28531594,
                        2.35699896, 2.94956000, 8.85381736),
               event_flag = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 
                              1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 
                              1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0),
               group = c(rep("group1", 40), rep("group2", 95)))

svykm_formula = "Surv(event_time, event_flag) ~ group" %>% as.formula
svy_design = svydesign( ids = D[ , unique_id ], weights = D[ , weights], data = D )

svylogrank(formula = svykm_formula, design = svy_design)
svylogrank(formula = svykm_formula, design = svy_design, method = "score")

Results:

> svylogrank(formula = svykm_formula, design = svy_design)
[[1]]
         score                               
[1,] -21.31458 3.99272 -5.338361 9.379041e-08

[[2]]
       chisq            p 
2.849810e+01 9.379041e-08 

attr(,"class")
[1] "svylogrank"
> svylogrank(formula = svykm_formula, design = svy_design, method = "score")
z.groupgroup2         Chisq             p 
   1.85490656    3.44067835    0.06360957 
attr(,"class")
[1] "svylogrank"

Solution

Yes, it's a bug (which will be fixed in the next version). Both method="large" and method="score" are correct, but the default method="small" isn't.

The problem is that survival::coxph.detail doesn't do quite what is documented for the at-risk matrix (or I've misunderstood it). However, if the observations are in the right order you can't tell. I originally tested this with a data set where it accidentally gives the right answer.

Update: this uses the original data. The method="large" and method="score" agree; the default method="small" also agrees if you sort the data so that the times are in ascending order (which won't be necessary for future releases)

> ii<-with(D, order(event_time, event_flag))
> 
> svy_design2 = svydesign( ids = ~unique_id , weights = ~weights, data = D[ii,] )
> 
> svylogrank(formula = svykm_formula, design = svy_design2)
[[1]]
        score                             
[1,] 6.994881 3.771016 1.854907 0.06360957

[[2]]
     chisq          p 
3.44067835 0.06360957 

attr(,"class")
[1] "svylogrank"
> svylogrank(formula = svykm_formula, design = svy_design, method = "large")
[[1]]
     score       se        z          p
1 6.994881 3.771016 1.854907 0.06360957

[[2]]
     chisq          p 
3.44067835 0.06360957 

attr(,"class")
[1] "svylogrank"
> 
> svylogrank(formula = svykm_formula, design = svy_design, method = "score")
z.groupgroup2         Chisq             p 
   1.85490656    3.44067835    0.06360957 
attr(,"class")
[1] "svylogrank"