I am using plyr to perform a bootstrapping function on subsets of a dataset.
Because the boot function creates a list object, I am currently using dlply to store the output of the function, then a ddply to get just the parts of the bootfunction that I want out
My example dataset is as follows:
dat = data.frame(x = rnorm(10, sd = 1),y = rnorm(10, sd = 1),z = rep(c("SppA", "SppB", "SppC", "SppD", "SppE"), 2),u = rep(c("SiteA", "SiteB"), 5))
the exact function isn't terribly important, but for the sake of reproducibility, here is the function I'm using:
boot_fun = function(x,i) {
i = sample(1:24, 24, replace = TRUE)
ts1 = mean(x[i,"x"])
ts2 = sample(x[i,"y"])
mean(ts1) - mean(ts2)
}
My plyr function is the following:
temp = dlply(dat, c("z", "u"), summarise, boot_object = boot(dat, boot_fun, R = 1000))
Since what I want out of the boot object is the mean and CI, I then perform the following plyr function:
temp2 = ldply(temp, summarise, mean = mean(boot$t), lowCI = quantile(boot$t, 0.025), highCI = quantile(boot$t, 0.975))
This works and accomplishes exactly what I want it to (although with an error about subsetting which doesn't seem to affect anything I care about), but I feel like there should be a way to skip the intermediate dlply step.
-edit- to clarify on what I'm trying to do if I didn't need to be splitting the groups
If I was manually splitting instead of using plyr, it would look something like the following:
temp = boot(dat[dat$z == "SppA" & dat$u == "SiteA",], boot_fun, R = 1000)
temp2$mean = mean(temp$t)
temp2$lowCI = quantile(temp$t, 0.025)
temp2$highCI = quantile(temp$t, 0.975)
If I didn't care about the groups at all and just wanted to do this to the whole group it would be something like
temp = boot(dat, boot_fun, R = 1000)
temp2$mean = mean(temp$t)
temp2$lowCI = quantile(temp$t, 0.025)
temp2$highCI = quantile(temp$t, 0.975)
Your example is not reproducible by me.
When I do temp = boot(dat, boot_fun, R = 1000)
, I get a WARNING
:
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = dat, statistic = boot_fun, R = 1000)
Bootstrap Statistics :
WARNING: All values of t1* are NA
I think your current code is pretty efficient, but if you're looking for other possibilities, you could try tidyverse
to 1) group_by
the relevant columns, 2) nest
the relevant data for bootstrapping, 3) run your bootstrap with the nested data, 4) isolate the statistics you desire, then 5) return to a normal data frame
library(boot)
library(tidyverse)
dat1 <- dat %>%
group_by(z,u) %>%
nest() %>%
mutate(data=map(data,~boot(.x, boot_fun, R=1000))) %>%
mutate(data=map(data,~data.frame(mean=mean(.x$t), lowCI=quantile(.x$t, 0.025), highCI=quantile(.x$t,0.975)))) %>%
unnest(data)