I´ve been analyzing a survey from the Dominican Republica called ENHOGAR 2021, for that I used:
The issue here is that my dataset has both weights and probabilities identified as "Factor_ expansión" and "Factor_ponderación" respectively.
I understand that for the as_survey_design()
function to work you need one or the other, so you have to create two objects one with weights and one with probs.
Is there a way where I can use both variables in as_survey_design()
function?
Ok. I have downloaded some data from the survey. Here is a scatterplot of the two weight variables
and here's a summary of them
> summary(a$F_expansión)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.268 45.688 76.628 124.666 128.420 9385.545
> summary(a$F_Pondera)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.02621 0.36649 0.61467 1.00000 1.03011 75.28567
> sum(a$F_expansión)
[1] 10568913
The two sets of weights contain the same information, except that the ponderación
weights are scaled to have unit mean and the expansión
weights sum to what is plausibly the size of the sampling frame (Google says 11 million for the national population). If you were using software such as SPSS or Excel that doesn't understand sampling weights, you could use ponderación
as the weights and get approximately correct inference for everything except totals: means, proportions, quantiles, regression coefficients, etc.
If you want to estimate totals you need the expansión
weights. In the R survey package (or the srvyr
wrapper) you should just use the expansión
weights, because the package can handle sampling weights correctly. The same holds if you're using SUDAAN or Stata or SPSS Complex Samples or the various SAS PROC SURVEYwhatevers.
Update: I have checked with a Latin American colleague and these seem to be standard terms: 'pesos de expansion' for the raw weights and 'pesos ponderados' for scaled weights