Search code examples
rlistiterationsurvey

Iterate through groups of variables in a survey - R


As indicated here, if you want to calculate proportions of a categorical variable in the amazing srvyr package you first have to group over the variables as factors and then use an empty srvyr::survey_mean, as in this example.

My goal is to iterate over the second variables cname and sch.wide while keeping the first grouping variable stype to avoid duplicating the code.

library(survey)
library(srvyr)

data(api)

df <- apiclus1 %>% 
  mutate(cname=as.factor(cname)) %>% 
  select(pw,stype, cname,sch.wide) %>% 
  as_survey_design(weights=pw) 

# proportions of sch.wide
df %>% 
  group_by(stype,sch.wide) %>% 
  summarise(prop=srvyr::survey_mean())
#> # A tibble: 6 x 4
#>   stype sch.wide   prop prop_se
#>   <fct> <fct>     <dbl>   <dbl>
#> 1 E     No       0.0833  0.0231
#> 2 E     Yes      0.917   0.0231
#> 3 H     No       0.214   0.110 
#> 4 H     Yes      0.786   0.110 
#> 5 M     No       0.32    0.0936
#> 6 M     Yes      0.68    0.0936

# proportions of cname
df %>% 
  group_by(stype,cname) %>% 
  summarise(prop=srvyr::survey_mean())
#> # A tibble: 33 x 4
#>    stype cname          prop prop_se
#>    <fct> <fct>         <dbl>   <dbl>
#>  1 E     Alameda     0.0556  0.0191 
#>  2 E     Fresno      0.0139  0.00978
#>  3 E     Kern        0.00694 0.00694
#>  4 E     Los Angeles 0.0833  0.0231 
#>  5 E     Mendocino   0.0139  0.00978
#>  6 E     Merced      0.0139  0.00978
#>  7 E     Orange      0.0903  0.0239 
#>  8 E     Plumas      0.0278  0.0137 
#>  9 E     San Diego   0.347   0.0398 
#> 10 E     San Joaquin 0.208   0.0339 
#> # ... with 23 more rows
Created on 2019-11-28 by the reprex package (v0.3.0)

Maybe the way to go here is creating lists that keep the first grouping variable and divide the data by a another group of variables, and then calculate the proportions.

I would like to find a solution that involves purrr:map or tidyverse.

Thanks in advance for the help, or for pointing to the answer!


Solution

  • There are multiple ways. If we pass as string, one option is to make use of group_by_at which takes strings as argument

    library(purrr)
    library(dplyr)
    library(survey)
    library(srvyr)
    map(c('sch.wide', 'cname'), ~
            df %>%
               group_by_at(vars("stype", .x)) %>%
               summarise(prop = srvyr::survey_mean()))
    #[[1]]
    # A tibble: 6 x 4
    #  stype sch.wide   prop prop_se
    #  <fct> <fct>     <dbl>   <dbl>
    #1 E     No       0.0833  0.0231
    #2 E     Yes      0.917   0.0231
    #3 H     No       0.214   0.110 
    #4 H     Yes      0.786   0.110 
    #5 M     No       0.32    0.0936
    #6 M     Yes      0.68    0.0936
    
    #[[2]]
    # A tibble: 30 x 4
    #   stype cname          prop prop_se
    #   <fct> <fct>         <dbl>   <dbl>
    # 1 E     Alameda     0.0556  0.0191 
    # 2 E     Fresno      0.0139  0.00978
    # 3 E     Kern        0.00694 0.00694
    # 4 E     Los Angeles 0.0833  0.0231 
    # 5 E     Mendocino   0.0139  0.00978
    # 6 E     Merced      0.0139  0.00978
    # 7 E     Orange      0.0903  0.0239 
    # 8 E     Plumas      0.0278  0.0137 
    # 9 E     San Diego   0.347   0.0398 
    #10 E     San Joaquin 0.208   0.0339 
    # … with 20 more rows
    

    Or another option is to wrap with quos to create a quosure list and evaluate (!!) it in group_by

    map(quos(sch.wide, cname), ~  
            df %>%
              group_by(stype, !!.x) %>% 
              summarise(prop = srvyr::survey_mean()))