Search code examples
rdplyrtidyevalnse

dplyr: How to use group_by inside a function?


I want to use use the dplyr::group_by function inside another function, but I do not know how to pass the arguments to this function.

Can someone provide a working example?

library(dplyr)
data(iris)
iris %.% group_by(Species) %.% summarise(n = n()) # 
## Source: local data frame [3 x 2]
##      Species  n
## 1  virginica 50
## 2 versicolor 50
## 3     setosa 50

mytable0 <- function(x, ...) x %.% group_by(...) %.% summarise(n = n())
mytable0(iris, "Species") # OK
## Source: local data frame [3 x 2]
##      Species  n
## 1  virginica 50
## 2 versicolor 50
## 3     setosa 50

mytable1 <- function(x, key) x %.% group_by(as.name(key)) %.% summarise(n = n())
mytable1(iris, "Species") # Wrong!
# Error: unsupported type for column 'as.name(key)' (SYMSXP)

mytable2 <- function(x, key) x %.% group_by(key) %.% summarise(n = n())
mytable2(iris, "Species") # Wrong!
# Error: index out of bounds

Solution

  • For programming, group_by_ is the counterpart to group_by:

    library(dplyr)
    
    mytable <- function(x, ...) x %>% group_by_(...) %>% summarise(n = n())
    mytable(iris, "Species")
    # or iris %>% mytable("Species")
    

    which gives:

         Species  n
    1     setosa 50
    2 versicolor 50
    3  virginica 50
    

    Update At the time this was written dplyr used %.% which is what was originally used above but now %>% is favored so have changed above to that to keep this relevant.

    Update 2 regroup is now deprecated, use group_by_ instead.

    Update 3 group_by_(list(...)) now becomes group_by_(...) in new version of dplyr as per Roberto's comment.

    Update 4 Added minor variation suggested in comments.

    Update 5: With rlang/tidyeval it is now possible to do this:

    library(rlang)
    mytable <- function(x, ...) {
      group_ <- syms(...)
      x %>% 
        group_by(!!!group_) %>% 
        summarise(n = n())
    }
    mytable(iris, "Species")
    

    or passing Species unevaluated, i.e. no quotes around it:

    library(rlang)
    mytable <- function(x, ...) {
      group_ <- enquos(...)
      x %>% 
        group_by(!!!group_) %>% 
        summarise(n = n())
    }
    mytable(iris, Species)
    

    Update 6: There is now a {{...}} notation that works if there is just one grouping variable:

    mytable <- function(x, group) {
      x %>% 
        group_by({{group}}) %>% 
        summarise(n = n())
    }
    mytable(iris, Species)