Search code examples
rusage-statistics

How to execute a stratified sample


Generate a weighted stratified sample, with variable input and variable weightings. Expected input is a variable length factor of integers with a varying number of levels.

I'm attempting to avoid hard-coding the weightings and strata, as they may vary. There are many questions on stack exchange regarding stratified sampling, but none that I could see avoiding hard-coded values.

I'm still a bit new to R and have tried various methods: survey::svydesign() and sampling::balancedstratification(). None seem to take a vector of frequency proportions to use as weightings.

variable_vector <- as.factor(c(1, 1, 1, 2, 2, 2, 2, 3)) 

freq_prop <- prop.table(table(factor_vector))


library(survey)

mysdesign <- svydesign(id = ~1,
                       strata = ~levels(variable_vector),
                       data = variable_vector,
                       fpc = freq_prop)

library(sampling)


sampling::balancedstratification(variable_vector,
                                 strata = levels(variable_vector),
                                 pik = freq_prop)

Neither of the above methods have worked.

Output from freq_prop is

[1] 0.375 0.500 0.125

Now I need a way of generating random samples of size 30 for example:
sample size 1 = 30 * 0.375
sample size 2 = 30 * 0.500
sample size 3 = 30 * 0.125

Solution

  • You can use base-r sample to generate a random sample. For example, to generate a random sample size of 30 using elements {1,2,3} of a set with a 0.375, 0.5, 0.125 probability for 1,2 and 3 respectively, we can do the following

    set.seed(777)
    r_sample<- sample(c(1,2,3), size=30, replace = TRUE, prob = c(0.375, 0.5, 0.125))
    table(r_sample)
    # r_sample
    #   1  2  3 
    #  13 14  3 
    
    

    You can also see ?sample to see the help page.