Search code examples
rdplyrsummarize

How to use "summarise" from dplyr with dynamic column names?


I am summarizing group means from a table using the summarize function from the dplyr package in R. I would like to do this dynamically, using a column name string stored in another variable.

The following is the "normal" way and it works, of course:

myTibble <- group_by( iris, Species)
summarise( myTibble, avg = mean( Sepal.Length))

# A tibble: 3 x 2
  Species     avg
  <fct>      <dbl>
1 setosa      5.01
2 versicolor  5.94
3 virginica   6.59

However, I would like to do something like this instead:

myTibble <- group_by( iris, Species)
colOfInterest <- "Sepal.Length"
summarise( myTibble, avg = mean( colOfInterest))

I've read the Programming with dplyr page, and I've tried a bunch of combinations of quo, enquo, !!, .dots=(...), etc., but I haven't figured out the right way to do it yet.

I'm also aware of this answer, but, 1) when I use the standard-evaluation function standardise_, R tells me that it's depreciated, and 2) that answer doesn't seem elegant at all. So, is there a good, easy way to do this?

Thank you!


Solution

  • 1) Use !!sym(...) like this:

    colOfInterest <- "Sepal.Length"
    iris %>% 
      group_by(Species) %>%
      summarize(avg = mean(!!sym(colOfInterest))) %>%
      ungroup
    

    giving:

    # A tibble: 3 x 2
      Species      avg
      <fct>      <dbl>
    1 setosa      5.01
    2 versicolor  5.94
    3 virginica   6.59
    

    2) A second approach is:

    colOfInterest <- "Sepal.Length"
    iris %>% 
      group_by(Species) %>%
      summarize(avg = mean(.data[[colOfInterest]])) %>%
      ungroup
    

    Of course this is straight forward in base R:

    aggregate(list(avg = iris[[colOfInterest]]), iris["Species"], mean)