In July of 2021, the survey package overhauled the svyquantile()
function. Quoting from the package's NEWS:
svyquantile()
has been COMPLETELY REWRITTEN. The old version is available asoldsvyquantile()
.
The svyquantile()
was being used in one of my packages when this change occurred. A co-author updated the code to retain use of the old version of the function for continuity. Now that it's been 1.5 years since the release, we're updating my code to use the new version. However, we're am experiencing an issue with the new version when combining its usage with svyby()
.
When we use the updated function, the results for all levels of the by
variable come out equivalent. Also, the returned data frame duplicates each of the columns (once for each level of the by
variable).
My guess is that rather than showing the results for the by
variable in each row, the results are mistakenly being added as new columns (with repeated column names, and duplicated rows).
Am I doing something wrong with svyby()
, or is this a bug in the code? The survey package is so widely used and I am no expert in its use, so I lean toward there being an issue with my code. But I am having trouble diagnosing the problem.
Thank you!
suppressPackageStartupMessages(library(survey))
packageVersion("survey")
#> [1] '4.1.1'
data(api)
# OLD svyquantile, results as expected
svyby(
# svyby args
formula = ~api00,
by = ~both,
design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
FUN = oldsvyquantile,
# args being passed to oldsvyquantile
na.rm = TRUE,
keep.var = FALSE,
quantiles = 0.5
)
#> both statistic
#> No No 631.0
#> Yes Yes 653.5
# NEW svyquantile, I don't understand what I am doing wrong
svyby(
# svyby args
formula = ~api00,
by = ~both,
design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
FUN = svyquantile,
# args being passed to svyquantile
na.rm = TRUE,
keep.var = FALSE,
quantiles = 0.5
)
#> both statistic.statistic.api00.quantile statistic.statistic.api00.ci.2.5
#> No No 631 547
#> Yes Yes 631 547
#> statistic.statistic.api00.ci.97.5 statistic.statistic.api00.se
#> No 722 39.75493
#> Yes 722 39.75493
#> statistic.statistic.api00.quantile statistic.statistic.api00.ci.2.5
#> No 655 566
#> Yes 655 566
#> statistic.statistic.api00.ci.97.5 statistic.statistic.api00.se
#> No 717 34.94774
#> Yes 717 34.94774
# results are as expected when used outside `svyby()`
svyquantile(
x = ~api00,
design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
na.rm = TRUE,
keep.var = FALSE,
quantiles = 0.5
)
#> $api00
#> quantile ci.2.5 ci.97.5 se
#> 0.5 652 561 714 35.66788
#>
#> attr(,"hasci")
#> [1] TRUE
#> attr(,"class")
#> [1] "newsvyquantile"
Created on 2022-12-25 with reprex v2.0.2
It's a bug. It's fixed in the development version, on r-forge (https://r-forge.r-project.org/R/?group_id=1788)