Search code examples
rsurvey

How do I get the weights from a survey design object in R?


Question

When working with complex survey data in R, I often use the survey package to create sampling weights or update them using a method such as raking or post-stratification. I know the weights are stored in a survey design object, but how do I extract those weights so I can inspect them or save them to a data file?

Example Data

As an example, we'll load a survey dataset from the "svrep" R package and create a survey design object. We'll also create a bootstrap replicate design object as well.

data("lou_vax_survey", package = 'svrep')

library(survey)

# Create a survey design object ----
survey_design <- svydesign(data = lou_vax_survey,
                           weights = ~ SAMPLING_WEIGHT,
                           ids = ~ 1)

# Create a replicate survey design object ----
rep_survey_design <- as.svrepdesign(survey_design,
                                    type = "boot",
                                    replicates = 10)

Solution

  • Extracting full-sample weights as a vector

    To extract the full-sample weights from a survey design object, you can use the function weights().

    If you're working with a "regular" survey design object without replicate weights, you can simply use the following:

    wts <- weights(survey_design)
    
    head(wts)
    #       1       2       3       4       5       6 
    # 596.702 596.702 596.702 596.702 596.702 596.702 
    

    If you're working with a replicate survey design object, you need to specify type = "sampling" to get the full-sample weights.

    wts <- weights(rep_survey_design, type = 'sampling')
    
    head(wts)
    #       1       2       3       4       5       6 
    # 596.702 596.702 596.702 596.702 596.702 596.702 
    

    Note that even though we write type = 'sampling', the weights that are extracted are not really the exact sampling weights. If you applied post-stratification or raking to your survey design object, for example, calling weights(..., type = 'sampling') will return the post-stratified or raked weights.

    Extracting the matrix of replicate weights

    For a replicate design object, you can specify weights(rep_survey_design, type = "analysis") to get the matrix of replicate weights.

    rep_wts <- weights(rep_survey_design, type = "analysis")
    
    head(rep_wts)
    #          [,1]     [,2]    [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]
    # [1,] 1193.404 1193.404 596.702 1193.404    0.000  596.702    0.000    0.000 1193.404 1193.404
    # [2,]  596.702  596.702 596.702    0.000    0.000    0.000  596.702    0.000  596.702  596.702
    # [3,] 1193.404  596.702 596.702    0.000 1193.404    0.000 1193.404  596.702    0.000  596.702
    # [4,]    0.000 1193.404 596.702 1193.404 1193.404 1193.404 1790.106 1193.404  596.702    0.000
    # [5,]    0.000    0.000   0.000  596.702 1790.106  596.702    0.000  596.702    0.000    0.000
    # [6,]    0.000 1193.404   0.000 1193.404    0.000  596.702    0.000  596.702    0.000    0.000
    

    Saving a dataframe with columns of weights

    Let's say you want to save your data to a CSV file so that you can share it with others or load it into Stata/SAS/SPSS. In this case, you'll want to have a data frame with columns for all of your variables as well as columns with the weights.

    For this, you can use the function as_data_frame_with_weights() from the svrep package, which works for survey designs with or without replicate weights.

    library(svrep)
    
    df_with_weights <- rep_survey_design |> 
      as_data_frame_with_weights(full_wgt_name = "FULL_SAMPLE_WGT",
                                 rep_wgt_prefix = "REP_WGT_")
    
    str(df_with_weights)
    # 'data.frame': 1000 obs. of  17 variables:
    #   $ RESPONSE_STATUS: chr  "Nonrespondent" ...
    # $ RACE_ETHNICITY : chr  "White alone, not Hispanic or Latino" ...
    # $ SEX            : chr  "Female" ...
    # $ EDUC_ATTAINMENT: chr  "Less than high school" ...
    # $ VAX_STATUS     : chr  NA ...
    # $ SAMPLING_WEIGHT: num  597 ...
    # $ FULL_SAMPLE_WGT: num  597 ...
    # $ REP_WGT_1      : num  1193 ...
    # $ REP_WGT_2      : num  1193 ...
    # $ REP_WGT_3      : num  597 ...
    # $ REP_WGT_4      : num  1193 ...
    # $ REP_WGT_5      : num  0 0 ...
    # $ REP_WGT_6      : num  597 ...
    # $ REP_WGT_7      : num  0 ...
    # $ REP_WGT_8      : num  0 0 ...
    # $ REP_WGT_9      : num  1193 ...
    # $ REP_WGT_10     : num  1193 ...