I have sampled some data from a sampling frame using the probability proportional to size (PPS) plan such that I have sampled 6
strata on combination of two variables: gender
and pre
with proportions:
gender High Low Medium
F 0.155 0.155 0.195
M 0.155 0.155 0.185
Now I want to specify the design of my sampled data using svydesign
from R package "survey". I was wondering how to define the fpc
(finite population correction) argument?
The documentation says:
For PPS sampling without replacement it is necessary to specify the probabilities for each stage of sampling using the
argument, and an overall weight argument should not be given.
out <- read.csv('https://raw.githubusercontent.com/rnorouzian/d/master/out.csv')
dstrat <- svydesign(id=~1,strata=~gender+pre, data=out, pps = "brewer", fpc = ????)
If we want to add proportion column, then we group by 'gender', 'pre', create the percentage by taking the count divided by the sum
of counts and left_join
out1 <- out %>%
group_by(gender, pre) %>%
summarise(n = n(), .groups = 'drop') %>%
mutate(fpc = n/sum(n)) %>%
Or using adorn_percentages
from janitor
out1 <- out %>%
tabyl(gender, pre) %>%
adorn_percentages(denominator = "all") %>%
pivot_longer(cols = -gender, names_to = 'pre',
values_to = 'fpc') %>%
If we need a function
f1 <- function(dat, grp_cols) {
dat %>%
group_by(across(all_of(grp_cols))) %>%
summarise(n = n(), .groups = 'drop') %>%
mutate(fpc = n/sum(n)) %>%
f1(out, c("gender", "pre"))
#Joining, by = c("gender", "pre")
# A tibble: 200 x 11
# gender pre n fpc no. fake.name sector pretest state email phone
# <chr> <chr> <int> <dbl> <int> <chr> <chr> <int> <chr> <chr> <chr>
# 1 F High 31 0.155 1 Pont Private 1352 NY Pont@...com xxx-xx-6216
# 2 F High 31 0.155 2 Street NGO 1438 CA Street@...com xxx-xx-6405
# 3 F High 31 0.155 3 Galvan Private 1389 NY Galvan@...com xxx-xx-9195
# 4 F High 31 0.155 4 Gorman NGO 1375 CA Gorman@...com xxx-xx-1845
# 5 F High 31 0.155 5 Jacinto Private 1386 CA Jacinto@...com xxx-xx-6237
# 6 F High 31 0.155 6 Shah Public 1384 CA Shah@...com xxx-xx-5723
# 7 F High 31 0.155 7 Randon Private 1360 TX Randon@...com xxx-xx-7542
# 8 F High 31 0.155 8 Koucherik NGO 1439 NY Koucherik@...com xxx-xx-9137
# 9 F High 31 0.155 9 Waters Industry 1414 TX Waters@...com xxx-xx-7560
#10 F High 31 0.155 10 David Industry 1396 CA David@...com xxx-xx-6498
# … with 190 more rows