Search code examples

How to use purrr::map on a full tibble rather than individual tibble rows

I'm trying to use a map function on a tibble that is a subset of the larger tibble, split by grouping.

I swear that this used to work, so perhaps there was a breaking change, but I get an error when I do group_split, and if I don't do group_split, I get a list of tibbles with one row instead of the full tibble.

How can I get the function inside the map to see a single tibble?

library(tidyverse, quietly = T)
ggplot2::diamonds %>% 
  sample_n(30) %>% 
  group_by(cut) %>% 
  group_split() %>% 
  pmap(function(...) { 
  data <- tibble(...)
#> Error in `pmap()`:
#> ℹ In index: 1.
#> ℹ With name: carat.
#> Caused by error in `tibble()`:
#> ! Tibble columns must have compatible sizes.
#> • Size 5: Existing data.
#> • Size 6: Column at position 3.
#> ℹ Only values of size one are recycled.
#> Backtrace:
#>      ▆
#>   1. ├─... %>% ...
#>   2. └─purrr::pmap(...)
#>   3.   └─purrr:::pmap_("list", .l, .f, ..., .progress = .progress)
#>   4.     ├─purrr:::with_indexed_errors(...)
#>   5.     │ └─base::withCallingHandlers(...)
#>   6.     ├─purrr:::call_with_cleanup(...)
#>   7.     └─.f(...)
#>   8.       └─tibble::tibble(...)
#>   9.         └─tibble:::tibble_quos(xs, .rows, .name_repair)
#>  10.           └─tibble:::vectbl_recycle_rows(...)
#>  11.             └─tibble:::abort_incompatible_size(...)
#>  12.               └─tibble:::tibble_abort(...)
#>  13.                 └─rlang::abort(x, class, ..., call = call, parent = parent, use_cli_format = TRUE)

Without group_split():

library(tidyverse, quietly = T)
ggplot2::diamonds %>% 
  sample_n(5) %>% 
  group_by(cut) %>% 
  pmap(function(...) { 
  data <- tibble(...)
#> # A tibble: 1 × 10
#>   carat cut     color clarity depth table price     x     y     z
#>   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1   1.5 Premium J     VVS2     61.5    60  8870  7.31  7.27  4.48
#> # A tibble: 1 × 10
#>   carat cut   color clarity depth table price     x     y     z
#>   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1  0.42 Ideal E     VS1      60.9    57  1372  4.83  4.86  2.95
#> # A tibble: 1 × 10
#>   carat cut   color clarity depth table price     x     y     z
#>   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1  0.75 Ideal F     SI1      62.3    57  2744  5.82  5.77  3.61
#> # A tibble: 1 × 10
#>   carat cut   color clarity depth table price     x     y     z
#>   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1  0.41 Ideal G     SI1      62.6    54   719  4.74  4.78  2.98
#> # A tibble: 1 × 10
#>   carat cut   color clarity depth table price     x     y     z
#>   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1  0.51 Ideal D     VVS2     61.7    56  2742  5.16  5.14  3.18
#> [[1]]
#> # A tibble: 1 × 10
#>   carat cut     color clarity depth table price     x     y     z
#>   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1   1.5 Premium J     VVS2     61.5    60  8870  7.31  7.27  4.48
#> [[2]]
#> # A tibble: 1 × 10
#>   carat cut   color clarity depth table price     x     y     z
#>   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1  0.42 Ideal E     VS1      60.9    57  1372  4.83  4.86  2.95
#> [[3]]
#> # A tibble: 1 × 10
#>   carat cut   color clarity depth table price     x     y     z
#>   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1  0.75 Ideal F     SI1      62.3    57  2744  5.82  5.77  3.61
#> [[4]]
#> # A tibble: 1 × 10
#>   carat cut   color clarity depth table price     x     y     z
#>   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1  0.41 Ideal G     SI1      62.6    54   719  4.74  4.78  2.98
#> [[5]]
#> # A tibble: 1 × 10
#>   carat cut   color clarity depth table price     x     y     z
#>   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1  0.51 Ideal D     VVS2     61.7    56  2742  5.16  5.14  3.18


  • Since you're iterating over a single list, you only need purrr::map. We use purrr::pmap when iterating over 3 or more lists in parallel.

    ggplot2::diamonds %>% 
      sample_n(30) %>% 
      group_by(cut) %>% 
      group_split() %>% 
        ~ dplyr::mutate(
         new_column = "See this value here?"))
    # A tibble: 4 × 11
      carat cut   color clarity depth table price     x     y     z new_column          
      <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>               
    1  1.01 Good  H     SI2      63.5    60  3971  6.23  6.31  3.98 See this value here?
    2  1    Good  H     SI2      63.8    57  3640  6.32  6.28  4.02 See this value here?
    3  1.55 Good  G     I1       63.6    58  4965  7.34  7.29  4.65 See this value here?
    4  0.4  Good  I     VS2      63.8    54   666  4.68  4.72  3    See this value here?
    # A tibble: 5 × 11
      carat cut       color clarity depth table price     x     y     z new_column          
      <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>               
    1  0.34 Very Good G     VVS2     62.3    59   740  4.45  4.48  2.78 See this value here?
    2  0.4  Very Good E     SI1      63      57   687  4.65  4.68  2.94 See this value here?
    3  0.29 Very Good E     VVS2     60.9    57   674  4.29  4.32  2.62 See this value here?
    4  0.4  Very Good D     SI1      60.3    62   720  4.7   4.75  2.85 See this value here?
    5  1.02 Very Good H     SI2      63.4    54  4252  6.37  6.43  4.06 See this value here?

    You can also nest data in a single dataframe, which I feel is more convenient:

    ggplot2::diamonds %>% 
      sample_n(30) %>% 
      group_by(cut) %>% 
      group_nest() %>%
        data = purrr::map(
          ~ dplyr::mutate(.x, new_column = "See this value here?")))
    # A tibble: 5 × 2
      cut       data             
      <ord>     <list>           
    1 Fair      <tibble [3 × 10]>
    2 Good      <tibble [2 × 10]>
    3 Very Good <tibble [9 × 10]>
    4 Premium   <tibble [9 × 10]>
    5 Ideal     <tibble [7 × 10]>