Search code examples
rdata.tabledata-manipulationstatistics-bootstrap

R bootstrap statistics by group for big data


I want to bootstrap a data set that has groups in it. A simple scenario would be bootstrapping simple means:

data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5))
stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2)), by = "group"]}
boot(data, stat, R = 10)

This gives me the error incorrect number of subscripts on matrix, because of by = "group" part. I managed to solve it using subsetting, but don't like this solution. Is there simpler way to make this kind of task work?

In particular, I'd like to introduce an additional argument in the statistics function like stat(x, i, groupvar) and pass it to the boot function like boot(data, stat(groupvar = group), R = 100)?


Solution

  • This should do it:

    data[, list(list(boot(.SD, stat, R = 10))), by = group]$V1