Search code examples
rdplyrrlangtidyeval

group_by with non-scalar character vectors using tidyeval


Using R 3.2.2 and dplyr 0.7.2 I'm trying to figure out how to effectively use group_by with fields supplied as character vectors.

Selecting is easy I can select a field via string like this

(function(field) { 
  mpg %>% dplyr::select(field) 
})("cyl")

Multiple fields via multiple strings like this

(function(...) { 
  mpg %>% dplyr::select(!!!quos(...)) 
})("cyl", "hwy")

and multiple fields via one character vector with length > 1 like this

(function(fields) {  
  mpg %>% dplyr::select(fields)  
})(c("cyl", "hwy"))

With group_by I cannot really find a way to do this for more than one string because if I manage to get an output it ends up grouping by the string I supply.

I managed to group by one string like this

(function(field) {  
  mpg %>% group_by(!!field := .data[[field]]) %>% tally() 
})("cyl")

Which is already quite ugly.

Does anyone know what I have to write so I can run

(function(field) {...})("cyl", "hwy")

and

(function(field) {...})(c("cyl", "hwy"))

respectively? I tried all sorts of combinations of !!, !!!, UQ, enquo, quos, unlist, etc... and saving them in intermediate variables because that sometimes seems to make a difference, but cannot get it to work.


Solution

  • select() is very special in dplyr. It doesn't accept columns, but column names or positions. So that's about the only main verb that accepts strings. (Technically when you supply a bare name like cyl to select, it actually gets evaluated as its own name, not as the vector inside the data frame.)

    If you want your function to take simple strings, as opposed to bare expressions or symbols, you don't need quosures. Just create symbols from the strings and unquote them:

    myselect <- function(...) {
      syms <- syms(list(...))
      select(mtcars, !!! syms)
    }
    mygroup <- function(...) {
      syms <- syms(list(...))
      group_by(mtcars, !!! syms)
    }
    
    myselect("cyl", "disp")
    mygroup("cyl", "disp")
    

    To debug the unquoting, wrap with expr() and check that the expression looks right:

    syms <- syms(list("cyl", "disp"))
    expr(group_by(mtcars, !!! syms))
    #> group_by(mtcars, cyl, disp)    # yup, looks right!
    

    See this talk for more on this (we'll update the programming vignette to make the concepts clearer): https://schd.ws/hosted_files/user2017/43/tidyeval-user.pdf.

    Finally, note that many verbs have a _at suffix variant that accepts strings and character vectors without fuss:

    group_by_at(mtcars, c("cyl", "disp"))