Search code examples
rrandomdplyrsampling

Sampling a proportion from a population data frame in R (random sampling in stratified sampling)


i have a data frame (population) with 3 groups. I want :

A) to take the 0.05 % of each category and B) to take different proportion from each group.

my data frame population is :

category = c(rep("a",15),rep("b",30),rep("c",50))
num = c(rnorm(15,0,1),rnorm(30,5,1),rnorm(50,10,1))
pop = data.frame(category,num);pop

i am thinking of the sample_n() function from dplyr but how can i take the 0.05% of each group?

in the code below i take 5 elements at random from each group.

pop%>%
  group_by(category)%>%
  sample_n(size = 5)

and how i can change the prop allocation say 0.05% from category a, 0.1 % from b and 20% from c?


Solution

  • You can create a dataframe with category and respective proportions, join it with pop and use sample_n to select rows in each group by its respective proportion.

    library(dplyr)
    
    prop_table <- data.frame(category = c('a','b', 'c'), prop = c(0.005, 0.001, 0.2))
    
    pop %>%
      left_join(prop_table, by = 'category') %>%
      group_by(category) %>%
      sample_n(n() * first(prop)) %>%
      ungroup %>%
      select(-prop)
    

    Note that sample_n has been replaced with slice_sample but slice_sample needs fixed prop value for each category and does not allow using something like first(prop).