I want to do bootstrapping manually for a panel dataset. I need to cluster at individual level to make sure the consistency of later manipulation, that is to say that all the observations for the same individual need to be selected in bootstrap sample. What I do is to do resampling with replacement on the vector of unique individual IDs, which is used as the index.
df <- data.frame(ID = c("A","A","A","B","B","B","C","C","C"), v1 = c(3,1,2,4,2,2,5,6,9), v2 = c(1,0,0,0,1,1,0,1,0))
boot.index <- sample(unique(df$ID), replace = TRUE)
Then I select rows according to the index, suppose boot.index = (B, B, C)
, I want to have a data frame like this
ID v1 v2
B 4 0
B 2 1
B 2 1
B 4 0
B 2 1
B 2 1
C 5 0
C 6 1
C 9 0
Apparently df1 <- df[df$ID == testboot.index,]
does not give what I want. I tried subset
and filter
in dplyr
, nothing works. Basically this is a issue of selecting the whole group by group index, any suggestions? Thanks!
set.seed(42)
boot.index <- sample(unique(df$ID), replace = TRUE)
boot.index
#[1] C C A
#Levels: A B C
do.call(rbind, lapply(boot.index, function(x) df[df$ID == x,]))
# ID v1 v2
#7 C 5 0
#8 C 6 1
#9 C 9 0
#71 C 5 0
#81 C 6 1
#91 C 9 0
#1 A 3 1
#2 A 1 0
#3 A 2 0