Search code examples
rdplyrstatistics-bootstrappurrr

bootstrap by group and calculate statistics


I'm trying to bootstrap some model fits and then calculate statistics without having to rerun the models every time. I can do this fine if I calculate r2 inside the first do() but I'd like to know how to access the data.

library(dplyr)
library(tidyr)
library(modelr)
library(purrr)

allmdls <- 
  mtcars %>% 
  group_by(cyl) %>% 
  do({
    datsplit=crossv_mc(.,10)
    mdls=list(map(datsplit$train, ~glm(hp~disp,data=.,family=gaussian(link='identity'))))
    data_frame(datsplit=list(datsplit),mdls)
  })

and now something like:

allmdls %>%
  by_slice(dmap,.f=map2_dbl(.$mdls,.$datsplit$test,rsquare))

but I get

Error: .y is not a vector (NULL)

or

allmdls %>% 
   group_by(cyl) %>% 
   do({
     map2_df(.x=.$mdls, .y=.$datsplit, .f=map2_dbl(.x=.x,.y=.y$test,.f=rsquare))
   })

Error in map2_dbl(.x = .x, .y = .y$test, .f = rsquare) : object '.x' not found

I can't seem to get the syntax right.

help? Thanks

EDIT: Thanks to @aosmith's comment, I created a somewhat simpler solution:

mtcars %>% 
  group_by(cyl) %>% 
  do({
    datplit=crossv_mc(.,10) %>% 
      mutate(mdls=map(train, ~glm(hp~disp,data=.)),
             r2=map2_dbl(mdls,test,rsquare)
             pctmae=map2_dbl(mdls,test,function(model,data) {mae(model,data)/mean(model$model$hp,na.rm=T)*100})
      )
  })

Solution

  • One option is to use map2 within mutate. Because you are using lists of lists I ended up with nested map2s to get access to the innermost lists. I pulled the test data out via map(datsplit, "test"), as neither the dollar sign operator nor the extract brackets were working for me.

    mutate(allmdls, rsq = map2(mdls, map(datsplit, "test"), ~map2_dbl(.x, .y, rsquare)))
    

    Here is another option that avoids the nested lists all together:

    mtcars %>%
        split(.$cyl) %>%
        map_df(crossv_mc, 10, .id = "cyl") %>%
        mutate(models = map(train, ~glm(hp ~ disp, data = .x)),
              rsq = map2_dbl(models, test, rsquare))