I want to draw clusters (defined by the variable id
) with replacement from a dataset, and in contrast to previously answered questions, I want clusters that are chosen K times to have each observation repeated K times. That is, I'm doing cluster bootstrapping.
For example, the following samples id=1
twice, but repeats the observations for id=1
only once in the new dataset s
. I want all observations from id=1
to appear twice.
f <- data.frame(id=c(1, 1, 2, 2, 2, 3, 3), X=rnorm(7))
set.seed(451)
new.ids <- sample(unique(f$id), replace=TRUE)
s <- f[f$id %in% new.ids, ]
One option would be to lapply
over each new.id
and save it in a list. Then you can stack that all together:
library(data.table)
rbindlist(lapply(new.ids, function(x) f[f$id %in% x,]))
# id X
#1: 1 1.20118333
#2: 1 -0.01280538
#3: 1 1.20118333
#4: 1 -0.01280538
#5: 3 -0.07302158
#6: 3 -1.26409125