Search code examples
rtidyverse

Passing multiple columns from function's argument to group_by


Consider the following example:

library(tidyverse)

df <- tibble(
  cat = rep(1:2, times = 4, each = 2),
  loc = rep(c("a", "b"), each = 8),
  value = rnorm(16)
)

df %>% 
  group_by(cat, loc) %>% 
  summarise(mean = mean(value), .groups = "drop")

# # A tibble: 4 x 3
# cat loc     mean
# * <int> <chr>  <dbl>
# 1     1 a     -0.563
# 2     1 b     -0.394
# 3     2 a      0.159
# 4     2 b      0.212

I would like to make a function of the last two lines that takes a group argument to pass multiple columns to group_by.

Here's a dummy function that computes the mean values by a group of columns as an example:

group_mean <- function(data, col_value, group) {
  data %>% 
    group_by(across(all_of(group))) %>% 
    summarise(mean = mean({{col_value}}), .groups = "drop")
}

group_mean(df, value, c("cat", "loc"))

# # A tibble: 4 x 3
# cat loc     mean
# * <int> <chr>  <dbl>
# 1     1 a     -0.563
# 2     1 b     -0.394
# 3     2 a      0.159
# 4     2 b      0.212

The function works but I would prefer a tidyselect/rlang approach to avoid quoting column names, like so:

group_mean(df, value, c(cat, loc))

# Error: Problem adding computed columns in `group_by()`.
# x Problem with `mutate()` input `..1`.
# x object 'loc' not found
# ℹ Input `..1` is `across(all_of(c(cat, loc)))`.

Enclosing group in {{}} works for a single column but not for multiple columns. How can I do that?


Solution

  • Consider using ... and then we can have the option to use either quoted or unquoted after converting to symbol with ensym

    group_mean <- function(data, col_value, ...) {
       data %>% 
         group_by(!!! ensyms(...)) %>% 
         summarise(mean = mean({{col_value}}), .groups = "drop")
     }
    

    -testing

    > group_mean(df, value, cat, loc)
    # A tibble: 4 x 3
        cat loc     mean
      <int> <chr>  <dbl>
    1     1 a      0.327
    2     1 b     -0.291
    3     2 a     -0.382
    4     2 b     -0.320
    > group_mean(df, value, 'cat', 'loc')
    # A tibble: 4 x 3
        cat loc     mean
      <int> <chr>  <dbl>
    1     1 a      0.327
    2     1 b     -0.291
    3     2 a     -0.382
    4     2 b     -0.320
    

    If we are already using ... as other arguments, then an option is

    group_mean <- function(data, col_value, group) {
      grp_lst <- as.list(substitute(group))
      if(length(grp_lst)> 1) grp_lst <- grp_lst[-1]
      grps <- purrr::map_chr(grp_lst, rlang::as_string)
      data %>% 
         group_by(across(all_of(grps))) %>% 
         summarise(mean = mean({{col_value}}), .groups = "drop")
    }
    

    -testing

    > group_mean(df, value, c(cat, loc))
    # A tibble: 4 x 3
        cat loc     mean
      <int> <chr>  <dbl>
    1     1 a      0.327
    2     1 b     -0.291
    3     2 a     -0.382
    4     2 b     -0.320