Consider the following example:
library(tidyverse)
df <- tibble(
cat = rep(1:2, times = 4, each = 2),
loc = rep(c("a", "b"), each = 8),
value = rnorm(16)
)
df %>%
group_by(cat, loc) %>%
summarise(mean = mean(value), .groups = "drop")
# # A tibble: 4 x 3
# cat loc mean
# * <int> <chr> <dbl>
# 1 1 a -0.563
# 2 1 b -0.394
# 3 2 a 0.159
# 4 2 b 0.212
I would like to make a function of the last two lines that takes a group
argument to pass multiple columns to group_by
.
Here's a dummy function that computes the mean
values by a group of columns as an example:
group_mean <- function(data, col_value, group) {
data %>%
group_by(across(all_of(group))) %>%
summarise(mean = mean({{col_value}}), .groups = "drop")
}
group_mean(df, value, c("cat", "loc"))
# # A tibble: 4 x 3
# cat loc mean
# * <int> <chr> <dbl>
# 1 1 a -0.563
# 2 1 b -0.394
# 3 2 a 0.159
# 4 2 b 0.212
The function works but I would prefer a tidyselect
/rlang
approach to avoid quoting column names, like so:
group_mean(df, value, c(cat, loc))
# Error: Problem adding computed columns in `group_by()`.
# x Problem with `mutate()` input `..1`.
# x object 'loc' not found
# ℹ Input `..1` is `across(all_of(c(cat, loc)))`.
Enclosing group
in {{}}
works for a single column but not for multiple columns. How can I do that?
Consider using ...
and then we can have the option to use either quoted or unquoted after converting to sym
bol with ensym
group_mean <- function(data, col_value, ...) {
data %>%
group_by(!!! ensyms(...)) %>%
summarise(mean = mean({{col_value}}), .groups = "drop")
}
-testing
> group_mean(df, value, cat, loc)
# A tibble: 4 x 3
cat loc mean
<int> <chr> <dbl>
1 1 a 0.327
2 1 b -0.291
3 2 a -0.382
4 2 b -0.320
> group_mean(df, value, 'cat', 'loc')
# A tibble: 4 x 3
cat loc mean
<int> <chr> <dbl>
1 1 a 0.327
2 1 b -0.291
3 2 a -0.382
4 2 b -0.320
If we are already using ...
as other arguments, then an option is
group_mean <- function(data, col_value, group) {
grp_lst <- as.list(substitute(group))
if(length(grp_lst)> 1) grp_lst <- grp_lst[-1]
grps <- purrr::map_chr(grp_lst, rlang::as_string)
data %>%
group_by(across(all_of(grps))) %>%
summarise(mean = mean({{col_value}}), .groups = "drop")
}
-testing
> group_mean(df, value, c(cat, loc))
# A tibble: 4 x 3
cat loc mean
<int> <chr> <dbl>
1 1 a 0.327
2 1 b -0.291
3 2 a -0.382
4 2 b -0.320