Search code examples
rsurvey

How to use survey to analyze the American Housing Survey data using replicate weights


I'm analyzing data from the American Housing Survey, which ship with replicate weights to compute correct standard errors, in R with survey, but I want to make sure that I'm specifying the design correctly.

Here is how I do it:

svy <- svrepdesign(data = ahs,
                   weight = ~WEIGHT,
                   repweights = "REPWEIGHT[0-9]+",
                   type = "Fay",
                   rho = 0.5,
                   scale = 4/160,
                   rscales = rep(1, 160),
                   mse = TRUE)

I set rho to 0.5 because, in in section 3.1 of the guide to use replicate weights published by the Census Bureau where they explain how to compute standard errors with SAS (https://www.census.gov/content/dam/Census/programs-surveys/ahs/tech-documentation/2015/Quick%20Guide%20to%20Estimating%20Variance%20Using%20Replicate%20Weights%202009%20to%20Current.pdf), they say to use the option VARMETHOD=BRR(FAY) without specifying any other options and, according to the SAS documentation (http://support.sas.com/documentation/onlinedoc/stat/142/surveymeans.pdf), the default value for this parameter is 0.5.

I set mse to TRUE because, in the formula they give for the standard error in section 4, the sum of squared deviations is calculated around the estimate of the statistic computed with the full sample weights.

Finally, I set scaleto 4/160 and rscalesto rep(1, 160) because, in that same formula, the sum of squared deviations is multiplied by 4/160 but there is no multiplier inside the sum operator.

However, when I look at Anthony Joseph Damico's webpage on the American Housing Survey (http://asdfree.com/american-housing-survey-ahs.html), he does that:

ahs_design <- 
    svrepdesign(
        weights = ~ wgt90geo ,
        repweights = "repwgt[1-9]" ,
        type = "Fay" ,
        rho = ( 1 - 1 / sqrt( 4 ) ) ,
        mse = TRUE ,
        data = ahs_df
    )

Forget about the names of the weight variables, which just changed in 2015 (presumably after he wrote that webpage), he's doing the same as me except that he doesn't specify the scale and rscales. Based on what I explain above and the documentation of survey, it seems to me that he should specify them as I did, but I've never used replicate weights with survey before, so I would like to make sure.

P. S. What I find even weirder is that, when I try not to specify scale and rscales, the standard errors I compute seem to be the same as when I do. This means that it probably doesn't matter in practice how I do it, but since the formula used to compute the standard errors is supposed to be different if I specify scale and rscales, I would still like to understand why it doesn't seem to affect the standard errors that are computed by survey.

P. S. bis: Another thing I don't understand is that, even though the Census Bureau says it has used Fay's method and recommend to use a SAS procedure that will result in a Fay coefficient of 0.5, there doesn't seem to be any Fay coefficient in the formula for the standard error given in the guide it published. This means that, if I were to write my own code to compute standard errors using that formula, the result would presumably be different than when I use survey with a rho of 0.5 or the SAS procedure recommended by the Census Bureau to compute standard errors, which doesn't make a lot of sense to me.


Solution

  • svrepdesign doesn't need scale or rscales arguments for Fay replicate weights, because it can work them out by itself. That's the point of having known types of weights. I should probably add a warning for when you specify them anyway.

    There doesn't need to be a Fay coefficient in the formula explicitly. When the weights were constructed, the sampling weights were multiplied by 2-rho or rho to get replicate weights. That's all been done. Now all you need is to know how to scale the squared residuals. The Census Bureau formula (p6 of your link) has a multiplier of 4/160. That 4 is 1/(1-rho)^2 -- Anthony Damico's code has the reverse conversion, working out rho=0.5 from the 4.

    Straightforward BRR would have a multiplier of 1/160 rather than 4/160.