Search code examples
rdplyrstatisticssurvey

Svydesign: analyzing complex surveys


I´ve been analyzing a survey from the Dominican Republica called ENHOGAR 2021, for that I used:

  • survey
  • srvyr

The issue here is that my dataset has both weights and probabilities identified as "Factor_ expansión" and "Factor_ponderación" respectively.

I understand that for the as_survey_design() function to work you need one or the other, so you have to create two objects one with weights and one with probs.

Is there a way where I can use both variables in as_survey_design() function?


Solution

  • Ok. I have downloaded some data from the survey. Here is a scatterplot of the two weight variables enter image description here

    and here's a summary of them

    > summary(a$F_expansión)
        Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
       3.268   45.688   76.628  124.666  128.420 9385.545 
    > summary(a$F_Pondera)
        Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
     0.02621  0.36649  0.61467  1.00000  1.03011 75.28567 
    > sum(a$F_expansión)
    [1] 10568913
    

    The two sets of weights contain the same information, except that the ponderación weights are scaled to have unit mean and the expansión weights sum to what is plausibly the size of the sampling frame (Google says 11 million for the national population). If you were using software such as SPSS or Excel that doesn't understand sampling weights, you could use ponderación as the weights and get approximately correct inference for everything except totals: means, proportions, quantiles, regression coefficients, etc.

    If you want to estimate totals you need the expansión weights. In the R survey package (or the srvyr wrapper) you should just use the expansión weights, because the package can handle sampling weights correctly. The same holds if you're using SUDAAN or Stata or SPSS Complex Samples or the various SAS PROC SURVEYwhatevers.

    Update: I have checked with a Latin American colleague and these seem to be standard terms: 'pesos de expansion' for the raw weights and 'pesos ponderados' for scaled weights