Avoiding intermediate dlply step when starting with a dataframe and ending with a dataframe

I am using plyr to perform a bootstrapping function on subsets of a dataset.

Because the boot function creates a list object, I am currently using dlply to store the output of the function, then a ddply to get just the parts of the bootfunction that I want out

My example dataset is as follows:

dat = data.frame(x = rnorm(10, sd = 1),y = rnorm(10, sd = 1),z = rep(c("SppA", "SppB", "SppC", "SppD", "SppE"), 2),u = rep(c("SiteA", "SiteB"), 5))

the exact function isn't terribly important, but for the sake of reproducibility, here is the function I'm using:

boot_fun = function(x,i) {
  i = sample(1:24, 24, replace = TRUE)
  ts1 = mean(x[i,"x"])
  ts2 = sample(x[i,"y"])
  mean(ts1) - mean(ts2)
}

My plyr function is the following:

temp = dlply(dat, c("z", "u"), summarise, boot_object = boot(dat, boot_fun, R = 1000))

Since what I want out of the boot object is the mean and CI, I then perform the following plyr function:

temp2 = ldply(temp, summarise, mean = mean(boot$t), lowCI = quantile(boot$t, 0.025), highCI = quantile(boot$t, 0.975))

This works and accomplishes exactly what I want it to (although with an error about subsetting which doesn't seem to affect anything I care about), but I feel like there should be a way to skip the intermediate dlply step.

-edit- to clarify on what I'm trying to do if I didn't need to be splitting the groups

If I was manually splitting instead of using plyr, it would look something like the following:

temp = boot(dat[dat$z == "SppA" & dat$u == "SiteA",], boot_fun, R = 1000)
temp2$mean = mean(temp$t)
temp2$lowCI = quantile(temp$t, 0.025)
temp2$highCI = quantile(temp$t, 0.975)

If I didn't care about the groups at all and just wanted to do this to the whole group it would be something like

temp = boot(dat, boot_fun, R = 1000)
temp2$mean = mean(temp$t)
temp2$lowCI = quantile(temp$t, 0.025)
temp2$highCI = quantile(temp$t, 0.975)

Solution

Your example is not reproducible by me.

When I do temp = boot(dat, boot_fun, R = 1000), I get a WARNING:

ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = dat, statistic = boot_fun, R = 1000)
Bootstrap Statistics :
WARNING: All values of t1* are NA

I think your current code is pretty efficient, but if you're looking for other possibilities, you could try tidyverse to 1) group_by the relevant columns, 2) nest the relevant data for bootstrapping, 3) run your bootstrap with the nested data, 4) isolate the statistics you desire, then 5) return to a normal data frame

library(boot)
library(tidyverse)
dat1 <- dat %>%
          group_by(z,u) %>%
          nest() %>%
          mutate(data=map(data,~boot(.x, boot_fun, R=1000))) %>%
          mutate(data=map(data,~data.frame(mean=mean(.x$t), lowCI=quantile(.x$t, 0.025), highCI=quantile(.x$t,0.975)))) %>%
          unnest(data)