I want to create a function based on dplyr
that performs certain operations on subsets of data. The subsets are defined by values of one or more key columns in the dataset. When only one column is used to identify subsets, my code works fine:
set.seed(1)
df <- tibble(
g1 = c(1, 1, 2, 2, 2),
g2 = c(1, 2, 1, 2, 1),
a = sample(5)
)
group_key <- "g1"
aggregate <- function(df, by) {
df %>% group_by(!!sym(by)) %>% summarize(a = mean(a))
}
aggregate(df, by = group_key)
This works as expected and returns something like this:
# A tibble: 2 x 2
g1 a
<dbl> <dbl>
1 1 1.5
2 2 4
Unfortunately everything breaks down if I change group_key
:
group_key <- c("g1", "g2")
aggregate(df, by = group_key)
I get an error: Only strings can be converted to symbols
, which I think comes from rlang::sym()
. Replacing it with syms()
does not work since I get a list of names, on which group_by()
chokes.
Any suggestions would be appreciated!
You need to use the unquote-splice operator !!!
:
aggregate <- function(df, by) {
df %>% group_by(!!!syms(by)) %>% summarize(a = mean(a))
}
group_key <- c("g1", "g2")
aggregate(df, by = group_key)
## A tibble: 4 x 3
## Groups: g1 [2]
# g1 g2 a
# <dbl> <dbl> <dbl>
#1 1 1 1
#2 1 2 4
#3 2 1 2.5
#4 2 2 5