Search code examples
rdataframesample

Select subgroups with replacement in a dataframe R


I saw many similar procedures : select rows from groups in a dataframe, etc. But this is not what I need.

Is there any quick and easy way to select groups with replacement in a dataframe ?

R Code Example:

> df = cbind(id = 1:10, groups = sample(1:3, 10, replace  = T))
> df
      id groups
 [1,]  1      3
 [2,]  2      3
 [3,]  3      2
 [4,]  4      2
 [5,]  5      1
 [6,]  6      3
 [7,]  7      1
 [8,]  8      2
 [9,]  9      1
[10,] 10      1

I need to select randomly 3 groups, with replacement among the 3 groups. So, for example, if my selection is groups 1, 1 and 2, my final dataframe will be the following:

> rbind(df[ df[,'groups'] == 1, ], df[ df[,'groups'] == 1, ], df[ df[,'groups'] == 2, ])
      id groups
 [1,]  5      1
 [2,]  7      1
 [3,]  9      1
 [4,] 10      1
 [5,]  5      1
 [6,]  7      1
 [7,]  9      1
 [8,] 10      1
 [9,]  3      2
[10,]  4      2
[11,]  8      2

How should I proceed ?

NB : My dataframe is constituted with many variables. I need a complete dataframe at the end with the selected individuals.


Solution

  • You could generate your group sample:

    x <- sample(unique(df$groups), 3, replace = TRUE)
    

    Then select the appropriate parts of df:

    do.call(rbind, lapply(x, function(i) df[df$groups == i,]))