Search code examples
rsurvey

Incorrect results using the new `svyquantile()` with `svyby()`


In July of 2021, the survey package overhauled the svyquantile() function. Quoting from the package's NEWS:

svyquantile() has been COMPLETELY REWRITTEN. The old version is available as oldsvyquantile().

The svyquantile() was being used in one of my packages when this change occurred. A co-author updated the code to retain use of the old version of the function for continuity. Now that it's been 1.5 years since the release, we're updating my code to use the new version. However, we're am experiencing an issue with the new version when combining its usage with svyby().

When we use the updated function, the results for all levels of the by variable come out equivalent. Also, the returned data frame duplicates each of the columns (once for each level of the by variable).

My guess is that rather than showing the results for the by variable in each row, the results are mistakenly being added as new columns (with repeated column names, and duplicated rows).

Am I doing something wrong with svyby(), or is this a bug in the code? The survey package is so widely used and I am no expert in its use, so I lean toward there being an issue with my code. But I am having trouble diagnosing the problem.

Thank you!

suppressPackageStartupMessages(library(survey))
packageVersion("survey")
#> [1] '4.1.1'
data(api)

# OLD svyquantile, results as expected
svyby(
  # svyby args
  formula = ~api00,
  by = ~both,
  design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
  FUN = oldsvyquantile,
  # args being passed to oldsvyquantile
  na.rm = TRUE,
  keep.var = FALSE,
  quantiles = 0.5
)
#>     both statistic
#> No    No     631.0
#> Yes  Yes     653.5


# NEW svyquantile, I don't understand what I am doing wrong
svyby(
  # svyby args
  formula = ~api00,
  by = ~both,
  design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
  FUN = svyquantile,
  # args being passed to svyquantile
  na.rm = TRUE,
  keep.var = FALSE,
  quantiles = 0.5
)
#>     both statistic.statistic.api00.quantile statistic.statistic.api00.ci.2.5
#> No    No                                631                              547
#> Yes  Yes                                631                              547
#>     statistic.statistic.api00.ci.97.5 statistic.statistic.api00.se
#> No                                722                     39.75493
#> Yes                               722                     39.75493
#>     statistic.statistic.api00.quantile statistic.statistic.api00.ci.2.5
#> No                                 655                              566
#> Yes                                655                              566
#>     statistic.statistic.api00.ci.97.5 statistic.statistic.api00.se
#> No                                717                     34.94774
#> Yes                               717                     34.94774

# results are as expected when used outside `svyby()`
svyquantile(
  x = ~api00,
  design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
  na.rm = TRUE,
  keep.var = FALSE,
  quantiles = 0.5
)
#> $api00
#>     quantile ci.2.5 ci.97.5       se
#> 0.5      652    561     714 35.66788
#> 
#> attr(,"hasci")
#> [1] TRUE
#> attr(,"class")
#> [1] "newsvyquantile"

Created on 2022-12-25 with reprex v2.0.2


Solution

  • It's a bug. It's fixed in the development version, on r-forge (https://r-forge.r-project.org/R/?group_id=1788)