i have a data frame (population) with 3 groups. I want :
A) to take the 0.05 % of each category and B) to take different proportion from each group.
my data frame population is :
category = c(rep("a",15),rep("b",30),rep("c",50))
num = c(rnorm(15,0,1),rnorm(30,5,1),rnorm(50,10,1))
pop = data.frame(category,num);pop
i am thinking of the sample_n()
function from dplyr but how can i take the 0.05% of each group?
in the code below i take 5 elements at random from each group.
pop%>%
group_by(category)%>%
sample_n(size = 5)
and how i can change the prop allocation say 0.05% from category a, 0.1 % from b and 20% from c?
You can create a dataframe with category and respective proportions, join it with pop
and use sample_n
to select rows in each group by its respective proportion.
library(dplyr)
prop_table <- data.frame(category = c('a','b', 'c'), prop = c(0.005, 0.001, 0.2))
pop %>%
left_join(prop_table, by = 'category') %>%
group_by(category) %>%
sample_n(n() * first(prop)) %>%
ungroup %>%
select(-prop)
Note that sample_n
has been replaced with slice_sample
but slice_sample
needs fixed prop
value for each category and does not allow using something like first(prop)
.