I am trying to understand the expected output of dplyr::group_by()
in conjunction with the use of dplyr::all_of()
. My understanding is that using dplyr::all_of()
should convert character vectors containing variable names to the bare names so that group_by()
, but this doesn't appear to happen.
Below, I generate some fake data, pass different objects to group_by()
with(out) all_of()
and calculate the number of observations in each group. In the example, passing a single bare column name without dplyr::all_of()
produces the correct output: one row per unique value of the column. However, passing character vectors or using dplyr::all_of()
produces incorrect output: one row regardless of the number of values in a column.
What is expected when using all_of
and how might I alternatively pass a character vector to group_by
to process as a vector of bare names?
library(dplyr)
# Create a 20-row data.frame with
# 2 variables each with 2 unique values.
df <- data.frame(var = rep(c("a", "b"), 10),
bar = rep(c(1, 2), 20))
# Output 1: 2x2 tibble - GOOD
df %>% group_by(var) %>% summarize(n = n())
# Output 2: 1x2 tibble - BAD
foo <- "var"
df %>% group_by(all_of(foo)) %>% summarize(n = n())
# Output 3: 1x2 tibble
df %>% group_by("var") %>% summarize(n = n())
# Output 4: Error in_var not found - BAD
foo2 <- list("var", "bar")
lapply(foo2, function(in_var) {
df %>%
group_by(in_var) %>%
summarize(n = n())
})
# Output 5: list of length 2 where
# each element is a 1x2 tibble - BAD
foo2 <- list("var", "bar")
lapply(foo2, function(in_var) {
df %>%
group_by(all_of(in_var)) %>%
summarize(n = n())
})
We can use group_by_at
lapply(foo2, function(in_var) df %>%
group_by_at(all_of(in_var)) %>%
summarise(n = n()))
-output
#[[1]]
# A tibble: 2 x 2
# var n
#* <chr> <int>
#1 a 20
#2 b 20
#[[2]]
# A tibble: 2 x 2
# bar n
#* <dbl> <int>
#1 1 20
#2 2 20
As across
replaces some of the functionality of group_by_at
, we can use it instead with all_of
:
lapply(foo2, function(in_var) df %>%
group_by(across(all_of(in_var))) %>%
summarise(n = n()))
Or convert to sym
bol and evaluate (!!
)
lapply(foo2, function(in_var) df %>%
group_by(!! rlang::sym(in_var)) %>%
summarise(n = n()))
Or use map
library(purrr)
map(foo2, ~ df %>%
group_by(!! rlang::sym(.x)) %>%
summarise(n = n()))
Or instead of group_by
, it can be count
map(foo2, ~ df %>%
count(across(all_of(.x))))