Search code examples
rdplyrtidyversepurrrspread

dplyr::spread for multiple columns using purrr::map


I want to get dplyr::spread for multiple columns using purrr::mapinto list of data.frames. Wonder how to achieve the desired result?

library(tidyverse)
mtcars %>%
  dplyr::group_by(gear, carb) %>%
  dplyr::summarise_at(
    .vars = names(.)[1:9]
    , .funs = c("mean")
  ) %>%
  dplyr::select(gear, carb, mpg) %>%
  tidyr::spread(key = "gear", value = mpg)

# A tibble: 6 x 4
   carb   `3`   `4`   `5`
  <dbl> <dbl> <dbl> <dbl>
1     1  20.3  29.1  NA  
2     2  17.2  24.8  28.2
3     3  16.3  NA    NA  
4     4  12.6  19.8  15.8
5     6  NA    NA    19.7
6     8  NA    NA    15  

mtcars %>%
  dplyr::group_by(gear, carb) %>%
  dplyr::summarise_at(
    .vars = names(.)[1:9]
    , .funs = c("mean")
  ) %>%
  dplyr::select(gear, carb, disp) %>%
  tidyr::spread(key = "gear", value = disp) 


# A tibble: 6 x 4
   carb   `3`   `4`   `5`
  <dbl> <dbl> <dbl> <dbl>
1     1  201.  84.2   NA 
2     2  346. 121.   108.
3     3  276.  NA     NA 
4     4  416. 164.   351 
5     6   NA   NA    145 
6     8   NA   NA    301 

Now I want to perform both process with a single command using purrr::map. Wonder how this can be achieved.

mtcars %>%
  dplyr::group_by(gear, carb) %>%
  dplyr::summarise_at(
    .vars = names(.)[1:9]
    , .funs = c("mean")
  ) %>%
  dplyr::select(gear, carb, mpg, disp) %>%
  purrr::map(.f = ~ tidyr::spread(data = mtcars,  key = "gear", value = .x))

Solution

  • So the key thing here is that the list you want to map over is actually the column names, not the columns or the dataframe itself. Here is a rough and ready approach that does what you want, though it is very brittle (the data frame and grouping columns are all hard-coded into the function). You might look into the programming with dplyr vignette if you need to do anything more fancy.

    library(tidyverse)
    to_spread <- mtcars %>%
      group_by(gear, carb) %>%
      summarise_all(mean)
    
    map(
      .x = colnames(to_spread)[3:11],
      .f = function(col) {
        to_spread %>%
          select(gear, carb, col) %>%
          spread(gear, col)
      }
    ) %>%
      set_names(colnames(to_spread)[3:11]) %>%
      head(3)
    #> $mpg
    #> # A tibble: 6 x 4
    #>    carb   `3`   `4`   `5`
    #>   <dbl> <dbl> <dbl> <dbl>
    #> 1     1  20.3  29.1  NA  
    #> 2     2  17.2  24.8  28.2
    #> 3     3  16.3  NA    NA  
    #> 4     4  12.6  19.8  15.8
    #> 5     6  NA    NA    19.7
    #> 6     8  NA    NA    15  
    #> 
    #> $cyl
    #> # A tibble: 6 x 4
    #>    carb   `3`   `4`   `5`
    #>   <dbl> <dbl> <dbl> <dbl>
    #> 1     1  5.33     4    NA
    #> 2     2  8        4     4
    #> 3     3  8       NA    NA
    #> 4     4  8        6     8
    #> 5     6 NA       NA     6
    #> 6     8 NA       NA     8
    #> 
    #> $disp
    #> # A tibble: 6 x 4
    #>    carb   `3`   `4`   `5`
    #>   <dbl> <dbl> <dbl> <dbl>
    #> 1     1  201.  84.2   NA 
    #> 2     2  346. 121.   108.
    #> 3     3  276.  NA     NA 
    #> 4     4  416. 164.   351 
    #> 5     6   NA   NA    145 
    #> 6     8   NA   NA    301
    

    Created on 2018-06-22 by the reprex package (v0.2.0).