Purrr with group_by

Using purrr, I would like to modify the following example from Advanced R to calculate the mean of each variable in the mtcars data, split by cyl:

   by_cyl <- split(mtcars, mtcars$cyl)
    by_cyl %>% 
      map(~ lm(mpg ~ wt, data = .x)) %>% 
      map(coef) %>% 
      map_dbl(2)

I can do this for a specific value of cyl:

mtcars %>% 
  filter(cyl ==8) %>% 
  map_df(mean)

But this does not work:

by_cyl %>% 
  map_df(~mean(.x, na.rm = TRUE))

I guess it's because I'm passing mean over a whole dataframe, instead of a vector, but I don't know how to fix this.

Solution

Another option is to do nested calls to map()/map_df():

library("purrr")
library("magrittr")
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
by_cyl <- split(mtcars, mtcars$cyl)
by_cyl %>% map_df(map_df, mean, na.rm = TRUE)
#> # A tibble: 3 x 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  26.7     4  105.  82.6  4.07  2.29  19.1 0.909 0.727  4.09  1.55
#> 2  19.7     6  183. 122.   3.59  3.12  18.0 0.571 0.429  3.86  3.43
#> 3  15.1     8  353. 209.   3.23  4.00  16.8 0     0.143  3.29  3.5

^{Created on 2023-06-28 with reprex v2.0.2}

Basically a list of data frames (what you obtain after using split()) is a list of lists. So you have to maps across the list you get from split() and then the data frames in that list.

EDIT: The syntax for the inner part of the first map_df() is a simpler way of specifying ~(.x %>% map_df(~(mean(.x, na.rm = TRUE))) that uses the ... or "dots" argument of map_df() to pass in the arguments for the inner map_df().

But this approach is much slower than Quinten's answer:

library("purrr")
library("magrittr")
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
by_cyl <- split(mtcars, mtcars$cyl)
bench::mark(by_cyl %>% map_df(map_df, mean, na.rm = TRUE),
            by_cyl %>% map_df(colMeans))
#> # A tibble: 2 x 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                        <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 by_cyl %>% map_df(map_df, mean, ~ 14.39ms  19.7ms      47.7    3.87MB     6.81
#> 2 by_cyl %>% map_df(colMeans)        3.68ms  4.57ms     192.   153.88KB     8.82