I'm trying to use a map function on a tibble that is a subset of the larger tibble, split by grouping.
I swear that this used to work, so perhaps there was a breaking change, but I get an error when I do group_split, and if I don't do group_split, I get a list of tibbles with one row instead of the full tibble.
How can I get the function inside the map to see a single tibble?
library(tidyverse, quietly = T)
ggplot2::diamonds %>%
sample_n(30) %>%
group_by(cut) %>%
group_split() %>%
pmap(function(...) {
data <- tibble(...)
print(data)
})
#> Error in `pmap()`:
#> ℹ In index: 1.
#> ℹ With name: carat.
#> Caused by error in `tibble()`:
#> ! Tibble columns must have compatible sizes.
#> • Size 5: Existing data.
#> • Size 6: Column at position 3.
#> ℹ Only values of size one are recycled.
#> Backtrace:
#> ▆
#> 1. ├─... %>% ...
#> 2. └─purrr::pmap(...)
#> 3. └─purrr:::pmap_("list", .l, .f, ..., .progress = .progress)
#> 4. ├─purrr:::with_indexed_errors(...)
#> 5. │ └─base::withCallingHandlers(...)
#> 6. ├─purrr:::call_with_cleanup(...)
#> 7. └─.f(...)
#> 8. └─tibble::tibble(...)
#> 9. └─tibble:::tibble_quos(xs, .rows, .name_repair)
#> 10. └─tibble:::vectbl_recycle_rows(...)
#> 11. └─tibble:::abort_incompatible_size(...)
#> 12. └─tibble:::tibble_abort(...)
#> 13. └─rlang::abort(x, class, ..., call = call, parent = parent, use_cli_format = TRUE)
Without group_split():
library(tidyverse, quietly = T)
ggplot2::diamonds %>%
sample_n(5) %>%
group_by(cut) %>%
pmap(function(...) {
data <- tibble(...)
print(data)
})
#> # A tibble: 1 × 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 1.5 Premium J VVS2 61.5 60 8870 7.31 7.27 4.48
#> # A tibble: 1 × 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.42 Ideal E VS1 60.9 57 1372 4.83 4.86 2.95
#> # A tibble: 1 × 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.75 Ideal F SI1 62.3 57 2744 5.82 5.77 3.61
#> # A tibble: 1 × 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.41 Ideal G SI1 62.6 54 719 4.74 4.78 2.98
#> # A tibble: 1 × 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.51 Ideal D VVS2 61.7 56 2742 5.16 5.14 3.18
#> [[1]]
#> # A tibble: 1 × 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 1.5 Premium J VVS2 61.5 60 8870 7.31 7.27 4.48
#>
#> [[2]]
#> # A tibble: 1 × 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.42 Ideal E VS1 60.9 57 1372 4.83 4.86 2.95
#>
#> [[3]]
#> # A tibble: 1 × 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.75 Ideal F SI1 62.3 57 2744 5.82 5.77 3.61
#>
#> [[4]]
#> # A tibble: 1 × 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.41 Ideal G SI1 62.6 54 719 4.74 4.78 2.98
#>
#> [[5]]
#> # A tibble: 1 × 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.51 Ideal D VVS2 61.7 56 2742 5.16 5.14 3.18
Since you're iterating over a single list, you only need purrr::map
. We use purrr::pmap
when iterating over 3 or more lists in parallel.
ggplot2::diamonds %>%
sample_n(30) %>%
group_by(cut) %>%
group_split() %>%
map(
~ dplyr::mutate(
.x,
new_column = "See this value here?"))
[[1]]
# A tibble: 4 × 11
carat cut color clarity depth table price x y z new_column
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>
1 1.01 Good H SI2 63.5 60 3971 6.23 6.31 3.98 See this value here?
2 1 Good H SI2 63.8 57 3640 6.32 6.28 4.02 See this value here?
3 1.55 Good G I1 63.6 58 4965 7.34 7.29 4.65 See this value here?
4 0.4 Good I VS2 63.8 54 666 4.68 4.72 3 See this value here?
[[2]]
# A tibble: 5 × 11
carat cut color clarity depth table price x y z new_column
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>
1 0.34 Very Good G VVS2 62.3 59 740 4.45 4.48 2.78 See this value here?
2 0.4 Very Good E SI1 63 57 687 4.65 4.68 2.94 See this value here?
3 0.29 Very Good E VVS2 60.9 57 674 4.29 4.32 2.62 See this value here?
4 0.4 Very Good D SI1 60.3 62 720 4.7 4.75 2.85 See this value here?
5 1.02 Very Good H SI2 63.4 54 4252 6.37 6.43 4.06 See this value here?
...
You can also nest data in a single dataframe, which I feel is more convenient:
ggplot2::diamonds %>%
sample_n(30) %>%
group_by(cut) %>%
group_nest() %>%
dplyr::mutate(
data = purrr::map(
data,
~ dplyr::mutate(.x, new_column = "See this value here?")))
# A tibble: 5 × 2
cut data
<ord> <list>
1 Fair <tibble [3 × 10]>
2 Good <tibble [2 × 10]>
3 Very Good <tibble [9 × 10]>
4 Premium <tibble [9 × 10]>
5 Ideal <tibble [7 × 10]>