In this data frame:
df <- data.frame(
Story = c(rep("C", 6), rep("X", 9), rep("A", 15), rep("B",12))
)
I want to randomly sample roughly 33% of all rows in each Story
. This proves harder than I thought. This method, for example, using ceiling
and slice_sample
does not get the desired result:
df %>%
group_by(Story) %>%
mutate(ID = row_number()) %>%
mutate(sample_size = ceiling(n() * 0.33)) %>%
slice_sample(n = unique(sample_size))
The desired results has:
What about just prop = 1/3
with slice_sample
?
> df %>%
+ slice_sample(prop = 1 / 3, by = Story)
Story
1 C
2 C
3 X
4 X
5 X
6 A
7 A
8 A
9 A
10 A
11 B
12 B
13 B
14 B
or if you like to use 0.33
and ceiling
> df %>%
+ filter(row_number() %in% sample(n(), ceiling(n() * 0.33)), .by = Story)
Story
1 C
2 C
3 X
4 X
5 X
6 A
7 A
8 A
9 A
10 A
11 B
12 B
13 B
14 B