As indicated here, if you want to calculate proportions of a categorical variable in the amazing srvyr package you first have to group over the variables as factors and then use an empty srvyr::survey_mean, as in this example.
My goal is to iterate over the second variables cname and sch.wide while keeping the first grouping variable stype to avoid duplicating the code.
library(survey)
library(srvyr)
data(api)
df <- apiclus1 %>%
mutate(cname=as.factor(cname)) %>%
select(pw,stype, cname,sch.wide) %>%
as_survey_design(weights=pw)
# proportions of sch.wide
df %>%
group_by(stype,sch.wide) %>%
summarise(prop=srvyr::survey_mean())
#> # A tibble: 6 x 4
#> stype sch.wide prop prop_se
#> <fct> <fct> <dbl> <dbl>
#> 1 E No 0.0833 0.0231
#> 2 E Yes 0.917 0.0231
#> 3 H No 0.214 0.110
#> 4 H Yes 0.786 0.110
#> 5 M No 0.32 0.0936
#> 6 M Yes 0.68 0.0936
# proportions of cname
df %>%
group_by(stype,cname) %>%
summarise(prop=srvyr::survey_mean())
#> # A tibble: 33 x 4
#> stype cname prop prop_se
#> <fct> <fct> <dbl> <dbl>
#> 1 E Alameda 0.0556 0.0191
#> 2 E Fresno 0.0139 0.00978
#> 3 E Kern 0.00694 0.00694
#> 4 E Los Angeles 0.0833 0.0231
#> 5 E Mendocino 0.0139 0.00978
#> 6 E Merced 0.0139 0.00978
#> 7 E Orange 0.0903 0.0239
#> 8 E Plumas 0.0278 0.0137
#> 9 E San Diego 0.347 0.0398
#> 10 E San Joaquin 0.208 0.0339
#> # ... with 23 more rows
Created on 2019-11-28 by the reprex package (v0.3.0)
Maybe the way to go here is creating lists that keep the first grouping variable and divide the data by a another group of variables, and then calculate the proportions.
I would like to find a solution that involves purrr:map or tidyverse.
Thanks in advance for the help, or for pointing to the answer!
There are multiple ways. If we pass as string, one option is to make use of group_by_at
which takes strings as argument
library(purrr)
library(dplyr)
library(survey)
library(srvyr)
map(c('sch.wide', 'cname'), ~
df %>%
group_by_at(vars("stype", .x)) %>%
summarise(prop = srvyr::survey_mean()))
#[[1]]
# A tibble: 6 x 4
# stype sch.wide prop prop_se
# <fct> <fct> <dbl> <dbl>
#1 E No 0.0833 0.0231
#2 E Yes 0.917 0.0231
#3 H No 0.214 0.110
#4 H Yes 0.786 0.110
#5 M No 0.32 0.0936
#6 M Yes 0.68 0.0936
#[[2]]
# A tibble: 30 x 4
# stype cname prop prop_se
# <fct> <fct> <dbl> <dbl>
# 1 E Alameda 0.0556 0.0191
# 2 E Fresno 0.0139 0.00978
# 3 E Kern 0.00694 0.00694
# 4 E Los Angeles 0.0833 0.0231
# 5 E Mendocino 0.0139 0.00978
# 6 E Merced 0.0139 0.00978
# 7 E Orange 0.0903 0.0239
# 8 E Plumas 0.0278 0.0137
# 9 E San Diego 0.347 0.0398
#10 E San Joaquin 0.208 0.0339
# … with 20 more rows
Or another option is to wrap with quos
to create a quosure list and evaluate (!!
) it in group_by
map(quos(sch.wide, cname), ~
df %>%
group_by(stype, !!.x) %>%
summarise(prop = srvyr::survey_mean()))