I'm new here and new to R. I'm wondering whether I've used the R survey package correctly to postStratify my data. Below you can see the data structure of my dataframe (df).
utype | gender | age | regzeit | finanz | sfeld | sindex |
---|---|---|---|---|---|---|
pri | female | 23 | ja | s | ARG | 5 |
sta | male | 23 | nein | f | ARG | -7 |
sta | female | 21 | ja | ARG | 11 | |
pri | male | 28 | ja | t | ARG | 1 |
I've oversampled females for the "gender" variable and students for the "utype" variable and now want to adjust for the distribution in the population. My n=383 was oversampled to n = 477 The intended distributions of my my n=383 sample are:
utype | male | female | Sum |
---|---|---|---|
pri | 54 | 68 | 122 |
sta | 128 | 133 | 261 |
Sum | 187 | 196 | 383 |
design <- svydesign(id = ~utype+gender, data = df)
Warning message: In svydesign.default(id = ~utype + gender, data = df) : No weights or probabilities supplied, assuming equal probability
pop.types <- data.frame(utype=c("sta","pri"), Freq=c(261,122))
designp <- postStratify(design, ~utype, pop.types)
postStratify(design, ~utype, pop.types)
svymean(~sindex, design)
mean | SE
sindex 0.48008 | 0.0192
svymean(~sindex, designp)
mean | SE
sindex 0.47692 | 0
My question now is whether the below code is correct and how I can postStratify for both variables utype and gender in my code or whether I have to run the postStratify command twice. I'm especially concerned that the standard error is zero in my weighted sample and because of the warning message. And whether the Freq values are correct for what I'm trying to do here?
The last thing I've been trying to figure out is how to get the svymean, svyhist or svyboxplot functions for "sindex", but only for the observations with utype == pri, so by group basically. This should all be applied to the weighted sindex values.
I hope I'm conforming to all the rules. Many thanks in advance!
You don't want to post-stratify twice (that would give you raking). You want to post-stratify once, using a post-stratum variable that is a combination of your two variables gender, as in your 2x2 table. That is,
designp <- update(designp, combined_var = interaction(gender,utype))
You now specify a pop.types=
argument that has the desired frequencies for each of the four levels of this new variable.
With only the four observations you list, you will end up with zero standard errors because there isn't any variation within any of the post-strata.