Search code examples
rresamplingstatistics-bootstrap

Resample with replacement by cluster


I want to draw clusters (defined by the variable id) with replacement from a dataset, and in contrast to previously answered questions, I want clusters that are chosen K times to have each observation repeated K times. That is, I'm doing cluster bootstrapping.

For example, the following samples id=1 twice, but repeats the observations for id=1 only once in the new dataset s. I want all observations from id=1 to appear twice.

f <- data.frame(id=c(1, 1, 2, 2, 2, 3, 3), X=rnorm(7))
set.seed(451)
new.ids <- sample(unique(f$id), replace=TRUE)
s <- f[f$id %in% new.ids, ]

Solution

  • One option would be to lapply over each new.id and save it in a list. Then you can stack that all together:

    library(data.table)
    rbindlist(lapply(new.ids, function(x) f[f$id %in% x,]))
    #  id           X
    #1:  1  1.20118333
    #2:  1 -0.01280538
    #3:  1  1.20118333
    #4:  1 -0.01280538
    #5:  3 -0.07302158
    #6:  3 -1.26409125