Search code examples
rrandomrowsample

Randomly selecting 4 unique rows of a data frame in r


I am working with R, and my data looks similar to this...

group  col_2  col_3   col_4
A      p_m     12      21
A      q_x     11      21
A      i_z     13      22
B      q_z     11      24
B      p_x     14      25
B      i_m     15      26
B      q_m     17      28
C      p_x     16      29
C      i_z     12      23
C      q_m     14      23
C      q_x     13      25 
D      p_z     11      25
D      i_z     15      26
D      q_m     17      28
D      q_x     14      29
E      p_x     13      30
E      i_m     15      26
E      q_m     17      28
E      p_x     16      29
F      i_z     12      23
F      q_x     13      25 
F      p_z     11      25
F      i_z     15      26
G      q_m     17      28
G      q_z     11      24
G      p_x     14      25
G      i_m     15      26
H      q_x     11      21
H      i_z     13      22
H      q_z     11      24
H      p_x     13      30

I need to randomly select 4 rows based on the group column. In other words, my output should not contain two observations that belong to the same group.

So I can get a result that looks like this ...

group  col_2  col_3   col_4
A      i_z     13      22
H      i_z     13      22
D      q_m     17      28
F      p_z     11      25

I have tried things like this.

set.seed(1234)
rndmData <- mydata %>%
  sample_n(5)

set.seed(1234)
rndmData <- mydata %>%
  sample_n(distinct(group), 5)

set.seed(1234)
rndmData <- mydata %>%
  sample_n(unique(group), 5)

However, none of them led me to the desired result.

Any help would be great.


Solution

  • Sample 4 groups, then sample one row from within each group:

    mydata %>%
      filter(group %in% sample(unique(group), size = 4)) %>%
      group_by(group) %>%
      slice_sample(n = 1) %>%
      ungroup()