Search code examples
rdplyrpurrrrlangbroom

(tidy, glance, augment) with exec


I see from the purrr documentation that it should be possible to map a list of functions onto arguments using the map(list(fn1, fn2, fn3), exec, !!!args) syntax or something similar. How would this work for the broom functions tidy, glance, and augment, which usually must be supplemented with do? These are three functions I almost always like to execute at the same time on the same data and model. Of course I can do this explicitly:

# works but is repetitive
MY_MODEL <- hp ~ cyl
my_glance <- mtcars %>% do(glance(lm(data = ., formula = MY_MODEL)))
my_tidy <- mtcars %>% do(tidy(lm(data = ., formula = MY_MODEL)))
my_augment <- mtcars %>% do(augment(lm(data = ., formula = MY_MODEL)))

I suspect there is a better, more compact way to do this without having to retype ...lm(data = ., formula = MY_MODEL... every time, but I couldn't figure it out. I tried

# doesn't work
omnibroom <- function(df, model){
    map(list(glance, tidy, augment),
        exec,
        ~{(do(.x(lm(data = df, formula = model))))}
        )
    }

omnibroom(mtcars, MY_MODEL)

but I think I don't understand the !!! syntax appropriately.

Is there a compact idiom for calling these three broom functions on the same model and data?


Solution

  • It's possible to do this in two lines with simple re-factoring. No do or !!! necessary.

    mdl <- mtcars %>% lm(data=., formula=MY_MODEL)
    res1 <- map( list(glance, tidy, augment), exec, mdl )
    

    If you really want to squish it down into a single line, use { to help guide pipe input to the correct place in lm:

    res2 <- mtcars %>% 
        {map( list(glance, tidy, augment), exec, lm(data=., formula=MY_MODEL) )}
    

    Verification:

    identical( res1, list(my_glance, my_tidy, my_augment) )    # TRUE
    identical( res1, res2 )                                    # TRUE
    

    EDIT to address grouping

    Arbitrary functions like lm don't respect data frame groups. While do is a popular approach to handle grouping in this case, I personally think that tidyr::nest() is more intuitive because it places all intermediates and results alongside the data:

    ## "Listify" broom functions: f -> map( ..., f )
    omnibroom <- map( list(glance, tidy, augment), ~function(l) map(l, .x) ) %>%
        set_names( c("glance","tidy","augment") )
    
    result <- mtcars %>% nest( data = -gear ) %>%
        mutate( model = map(data, lm, formula=MY_MODEL) ) %>%
        mutate_at( "model", omnibroom )
    
    #  # A tibble: 3 x 6
    #     gear data              model  glance           tidy           augment
    #    <dbl> <list>            <list> <list>           <list>         <list>
    #  1     4 <tibble [12 × 10… <lm>   <tibble [1 × 11… <tibble [2 × … <tibble [12 × …
    #  2     3 <tibble [15 × 10… <lm>   <tibble [1 × 11… <tibble [2 × … <tibble [15 × …
    #  3     5 <tibble [5 × 10]> <lm>   <tibble [1 × 11… <tibble [2 × … <tibble [5 × 9…
    
    

    This format also naturally lends itself to unnesting, since broom functions produce data frames:

    result %>% select( gear, tidy ) %>% unnest( tidy )
    
    #  # A tibble: 6 x 6
    #     gear term        estimate std.error statistic p.value
    #    <dbl> <chr>          <dbl>     <dbl>     <dbl>   <dbl>
    #  1     4 (Intercept)    -5.00     25.3     -0.198 0.847
    #  2     4 cyl            20.2       5.30     3.82  0.00339
    #  3     3 (Intercept)   -47.5      56.1     -0.847 0.412
    #  4     3 cyl            30.0       7.42     4.04  0.00142
    #  5     5 (Intercept)  -101.       51.9     -1.94  0.148
    #  6     5 cyl            49.4       8.28     5.96  0.00944