I saw many similar procedures : select rows from groups in a dataframe, etc. But this is not what I need.
Is there any quick and easy way to select groups with replacement in a dataframe ?
R Code Example:
> df = cbind(id = 1:10, groups = sample(1:3, 10, replace = T))
> df
id groups
[1,] 1 3
[2,] 2 3
[3,] 3 2
[4,] 4 2
[5,] 5 1
[6,] 6 3
[7,] 7 1
[8,] 8 2
[9,] 9 1
[10,] 10 1
I need to select randomly 3 groups, with replacement among the 3 groups. So, for example, if my selection is groups 1, 1 and 2, my final dataframe will be the following:
> rbind(df[ df[,'groups'] == 1, ], df[ df[,'groups'] == 1, ], df[ df[,'groups'] == 2, ])
id groups
[1,] 5 1
[2,] 7 1
[3,] 9 1
[4,] 10 1
[5,] 5 1
[6,] 7 1
[7,] 9 1
[8,] 10 1
[9,] 3 2
[10,] 4 2
[11,] 8 2
How should I proceed ?
NB : My dataframe is constituted with many variables. I need a complete dataframe at the end with the selected individuals.
You could generate your group sample:
x <- sample(unique(df$groups), 3, replace = TRUE)
Then select the appropriate parts of df:
do.call(rbind, lapply(x, function(i) df[df$groups == i,]))