Search code examples
rfunctionsurveyconfidence-interval

Set custom confidence level in svymean when using svyby in r survey package


I am running code to find weighted means by group with confidence intervals for multiple variables in my dataset. My code looks like the following, demoed using mtcars package:

library(survey)
library(tidyverse)
var_list = c("wt","qsec")
  
svy_design <- svydesign(
  ids = ~1, 
  data = mtcars |> 
    dplyr::select(cyl,all_of(var_list),mpg) |> 
    na.omit(),
  weights =  mtcars |> 
    dplyr::select(cyl,all_of(var_list),mpg) |> 
    na.omit() |> select(mpg)
)

lapply(var_list, function( x ) svyby(as.formula( paste0( "~" , x ) ) ,
                     by = ~cyl, 
                     design = svy_design, 
                     FUN = svymean,
                     keep.names = FALSE, vartype = "ci")) %>% 
  bind_rows() |> relocate(wt, .after = last_col()) %>%  pivot_longer(!c(cyl,ci_l,ci_u),names_to = "var",values_to = "mean",values_drop_na = TRUE)


This works just fine, and produces the desired result of confidence intervals for each variable. However, when I try to switch the confidence interval from the default 0.95, I receive errors:

library(survey)
library(tidyverse)
var_list = c("wt","qsec")
  
svy_design <- svydesign(
  ids = ~1, 
  data = mtcars |> 
    dplyr::select(cyl,all_of(var_list),mpg) |> 
    na.omit(),
  weights =  mtcars |> 
    dplyr::select(cyl,all_of(var_list),mpg) |> 
    na.omit() |> select(mpg)
)

lapply(var_list, function( x ) svyby(as.formula( paste0( "~" , x ) ) ,
                     by = ~cyl, 
                     design = svy_design, 
                     FUN = svymean(level = 0.99),
                     keep.names = FALSE, vartype = "ci")) %>% 
  bind_rows() |> relocate(wt, .after = last_col()) %>%  pivot_longer(!c(cyl,ci_l,ci_u),names_to = "var",values_to = "mean",values_drop_na = TRUE)

Error in svymean(level = 0.99) : 
  argument "design" is missing, with no default

How can I use svymean within svyby and set a custom confidence interval level?


Solution

  • The first issue is that FUN has to be a function that takes a formula and a design as its first two arguments, so you should just provide FUN = svymean and any other arguments to FUN inside ....

    However, the confidence interval is actually calculated through setting vartype = "ci" in svyby, which calls survey:::confint.svyby internally and does not expose any control over the confidence level.

    You can get around this by calculating the CI separately:

    step1 <- lapply(var_list, \(x) {
      ## Just a standard error - do NOT change vartype=... here!
      d1 <- svyby(as.formula(paste0("~", x)) ,
                  by = ~cyl, 
                  design = svy_design, 
                  FUN = svymean,
                  keep.names = FALSE)
      ## NOW calculate the CI with the desired level
      d2 <- confint(d1, level = .99)
      ## Combine results
      d1[,c("ci_l", "ci_u")] <- d2
      dplyr::select(d1, -"se")
    })
    

    The rest of the pipeline continues unchanged:

    bind_rows(step1) %>%
      relocate(wt, .after = last_col()) %>%
      pivot_longer(!c(cyl, ci_l, ci_u),
                   names_to = "var",
                   values_to = "mean",
                   values_drop_na = TRUE)
    
    #> A tibble: 6 × 5
    #>     cyl  ci_l  ci_u var    mean
    #>   <dbl> <dbl> <dbl> <chr> <dbl>
    #> 1     4  1.80  2.64 wt     2.22
    #> 2     6  2.78  3.43 wt     3.10
    #> 3     8  3.48  4.36 wt     3.92
    #> ...