Search code examples
rdplyrstatistics-bootstrap

R - dplyr bootstrap several variables


I am trying out bootstraping with dplyr and I am stuck with a simple line of code.

Using the function bootstrap, I found out that it is possible to do

library(dplyr)
library(broom)

mtcars %>% bootstrap(10) %>% 
  do(tidy(sample(.$cyl, 2)))

to get a nice straightforward output

   replicate     x
         (int) (dbl)
1          1     6
2          1     8
3          2     6
4          2     8
...

However, it would be nice to get more variables (columns), but I can't figure it how.

I thought something like

mtcars %>% bootstrap(10) %>% 
  do(tidy(sample(., 2)))

or

mtcars %>% bootstrap(10) %>% 
  do(tidy(sample_n(2)))

would work but it doesn't.

Any clue how I can subset several variables ?

Imagine I want to get mpg, cyl and disp to get something like (output)

   replicate   cyl   mpg  disp
        (int) (dbl)
1          1     6   21  ...
2          1     4   22  ...
3          2     6   ... 
4          2     8   ...
...

(I am randomly choosing two cases sample = 2 and I repeat this routine (bootstrap) 10 times).


Solution

  • Using

    set.seed(123)
    sapply(mtcars, function(v) sample(v,2))
    

    you can sample 2 values from each column of mtcars, where however the columns are sampled independently of each other (not sure that this is what you want and/or that it makes sense). Therefore a solution using broom might be:

    mtcars %>%
        bootstrap(10) %>%
        do(tidy(sapply(., function(v) sample(v,2))))
    

    If, on the other hand, preserving the relations between the columns is important, you could use something like

    do.call("rbind",lapply(1:10, function(dum) mtcars[sample.int(nrow(mtcars), 2), ]))