Search code examples
rdplyrmagrittr

Pass grouped data.frame using dplyr / magrittr


With base::by() and data.table, we can group by variable(s) and then be able to access a data.frame that is subset by the groups. How can I do the equivalent with or ?

I tried tib %>% group_by(grp) %>% mutate(V2 = fx(.)) but instead of passing the subgroups, the dot passes the entire grouped tibble from the LHS. Here's an MRE:

library(dplyr)
tib = tibble(grp = rep(1:2, 1:2),
             V1 = 1:3)
tib
#> # A tibble: 3 x 2
#>     grp    V1
#>   <int> <int>
#> 1     1     1
#> 2     2     2
#> 3     2     3

fx = function(x){
  ans = seq(nrow(x))
  print(ans)
}

tib %>%
  group_by(grp)%>%
  mutate(V2 = fx(.))
#> [1] 1 2 3
#> Error: Problem with `mutate()` input `V2`.
#> x Input `V2` can't be recycled to size 1.
#> i Input `V2` is `fx(.)`.
#> i Input `V2` must be size 1, not 3.
#> i The error occured in group 1: grp = 1.

And here is the behavior I hoped for using :

library(data.table)
as.data.table(tib)[, V2 := fx(.SD), grp][]
#> [1] 1
#> [1] 1 2
#>      grp    V1    V2
#>    <int> <int> <int>
#> 1:     1     1     1
#> 2:     2     2     1
#> 3:     2     3     2

Solution

  • You can use cur_data() from dplyr 1.0.0 onwards.

    library(dplyr)
    tib %>% group_by(grp)%>% mutate(V2 = fx(cur_data()))
    
    #[1] 1
    #[1] 1 2
    # A tibble: 3 x 3
    # Groups:   grp [2]
    #    grp    V1    V2
    #  <int> <int> <int>
    #1     1     1     1
    #2     2     2     1
    #3     2     3     2
    

    Note that cur_data() passes data without grouping variable (grp). If you want grouping variable should be passed to the function use cur_data_all() instead.