I am trying out bootstraping
with dplyr
and I am stuck with a simple line of code.
Using the function bootstrap
, I found out that it is possible to do
library(dplyr)
library(broom)
mtcars %>% bootstrap(10) %>%
do(tidy(sample(.$cyl, 2)))
to get a nice straightforward output
replicate x
(int) (dbl)
1 1 6
2 1 8
3 2 6
4 2 8
...
However, it would be nice to get more variables (columns), but I can't figure it how.
I thought something like
mtcars %>% bootstrap(10) %>%
do(tidy(sample(., 2)))
or
mtcars %>% bootstrap(10) %>%
do(tidy(sample_n(2)))
would work but it doesn't.
Any clue how I can subset several variables ?
Imagine I want to get mpg
, cyl
and disp
to get something like
(output)
replicate cyl mpg disp
(int) (dbl)
1 1 6 21 ...
2 1 4 22 ...
3 2 6 ...
4 2 8 ...
...
(I am randomly choosing two cases sample = 2
and I repeat this routine (bootstrap
) 10 times).
Using
set.seed(123)
sapply(mtcars, function(v) sample(v,2))
you can sample 2 values from each column of mtcars
, where however the columns are sampled independently of each other (not sure that this is what you want and/or that it makes sense). Therefore a solution using broom
might be:
mtcars %>%
bootstrap(10) %>%
do(tidy(sapply(., function(v) sample(v,2))))
If, on the other hand, preserving the relations between the columns is important, you could use something like
do.call("rbind",lapply(1:10, function(dum) mtcars[sample.int(nrow(mtcars), 2), ]))