Search code examples
rsurveystandard-deviationstandard-error

Calculating standard deviation in with svyfgt (R)


I am using t-tests in R to test the significance of the difference in means that arises when adding weights, stratification, and clustering (respectively) to the survey design when utilizing the FGT measure of poverty, which I calculate using the svyfgt function in the convey package. I am running the t-tests by creating vectors for each survey design which include the mean, standard deviation, and sample size, hence, I need to obtain the standard deviation for the svyfgt mean.

In the survey package, there is a svysd function, which is used to calculate the standard deviation when complex survey designs are applied. This value is quite different from the value obtained by simply multiplying the SE by sqrt(n), as shown below:

library(survey)

wel <- c(68008.19, 128504.61,  21347.69,
             33272.95,  61828.96,  32764.44,
             92545.62,  58431.89,  95596.82,
             117734.27)
rmul <- c(16, 16, 16, 16, 16, 16, 16,
              20, 20, 20)
splin <- c(23149.64, 23149.64, 23149.64, 23149.64, 23149.64,
            21322.23, 21322.23, 21322.23, 21322.23, 21322.23)

survey.data <- data.frame(wel, rmul, splin)

survey_weighted <- svydesign(data = survey.data,
                             ids = ~wel, 
                             weights = ~rmul, 
                             nest = TRUE)

svymean(~wel, survey_weighted)

svysd(~wel, survey_weighted)
11498*sqrt(10)

In the convey package, there is no equivalent "svyfgtsd" function, and simply multiplying the SE by sqrt(n) would seem to yield the wrong answer (based on the previously shown difference in results between svysd and that expression). Therefore, I am not sure how I might obtain the standard deviation for FGT_0_weighted. Is there a function I am not aware of, or a stats concept that might aid me here?

library(convey)

fgtsurvey_weighted <- convey_prep(survey_weighted) 

FGT_0_weighted <- svyfgt(~wel, 
                         fgtsurvey_weighted,  
                         g=0, 
                         abs_thresh = survey.data$splin)
FGT_0_weighted

For reference, I will be using the sd values in t-tests like so (disregard sd values):

FGT_0_unweighted_vector <- c(rnorm(9710, mean = 0.28919, sd = sd_FGT_0))
FGT_0_cluster_vector <- c(rnorm(9710, mean = 0.33259, sd = sd_FGT_0_cluster))
t.test(FGT_0_cluster_vector, FGT_0_unweighted_vector, var.equal = FALSE)

Solution

  • When the poverty threshold is absolute, the FGT is the mean of a binary variable (poor/non-poor); i.e., a proportion. The standard deviation of a binary variable is sqrt( p*(1-p) ).

    However, you are probably looking for the standard error (a measure of the sampling error of the FGT estimate), just do SE( FGT_0_weighted ). That's what is used in t-tests.

    Taking stratification and clustering into account will alter standard error estimates, while weighting will affect the mean (and all point estimates, like FGT) as well. Using t-tests to test whether mean estimates change makes sense for comparing weighted and unweighted estimates.

    Working with sqrt(n) is misleading under complex sampling. The usual n is what is called nominal sample size, but the effective sample size is usually smaller than that (because of cluster sampling.).

    A concept related to what you are tying to do is the design effect, but that is not yet implemented for svyfgt (although, for absolute thresholds, you can still get it using svymean).