Search code examples
rdplyrtidyverse

Sample all rows of N groups


I'm trying to find a way to sample N whole groups from a dataframe.

For example, if we had the below dataframe:

   group value
1      a     1
2      a     2
3      a     3
4      b     4
5      b     5
6      c     6
7      d     7
8      d     8
9      d     9
10     d    10

Code

data.frame(group = c(rep("a", 3),
                      rep("b", 2),
                      "c",
                      rep("d", 4)),
            value = 1:10)

If we wanted to sample n = 2 groups, I'd like my output to be something like:

  group value
1     a     1
2     a     2
3     a     3
4     c     6

if for example the n = 2 groups selected for sampling were a and c

I tried using group_by(group) %>% slice_sample(n = 2) however that gives a sample of two for every group as opposed to every observation for two groups, which is what I am after.

Ideally a tidyverse solution would be best, but it might require a new function.

Thankyou!


Solution

  • set.seed(123)
    
    filter(df, group %in% sample(unique(df$group), 2))
    

      group value
    1     c     6
    2     d     7
    3     d     8
    4     d     9
    5     d    10