Search code examples
rvectordplyrsummarizedata-wrangling

R: What is the expected output of passing a character vector to dplyr::all_of()?


I am trying to understand the expected output of dplyr::group_by() in conjunction with the use of dplyr::all_of(). My understanding is that using dplyr::all_of() should convert character vectors containing variable names to the bare names so that group_by(), but this doesn't appear to happen.

Below, I generate some fake data, pass different objects to group_by() with(out) all_of() and calculate the number of observations in each group. In the example, passing a single bare column name without dplyr::all_of() produces the correct output: one row per unique value of the column. However, passing character vectors or using dplyr::all_of() produces incorrect output: one row regardless of the number of values in a column.

What is expected when using all_of and how might I alternatively pass a character vector to group_by to process as a vector of bare names?

library(dplyr)

# Create a 20-row data.frame with
# 2 variables each with 2 unique values.
df <- data.frame(var = rep(c("a", "b"), 10),
                 bar = rep(c(1, 2), 20))

# Output 1: 2x2 tibble - GOOD
df %>% group_by(var) %>% summarize(n = n())

# Output 2: 1x2 tibble - BAD
foo <- "var"
df %>% group_by(all_of(foo)) %>% summarize(n = n())

# Output 3: 1x2 tibble
df %>% group_by("var") %>% summarize(n = n())

# Output 4: Error in_var not found - BAD
foo2 <- list("var", "bar")
lapply(foo2, function(in_var) {
  df %>%
    group_by(in_var) %>%
    summarize(n = n())
})

# Output 5: list of length 2 where
# each element is a 1x2 tibble - BAD
foo2 <- list("var", "bar")
lapply(foo2, function(in_var) {
  df %>%
    group_by(all_of(in_var)) %>%
    summarize(n = n())
})

Solution

  • We can use group_by_at

    lapply(foo2, function(in_var) df %>% 
          group_by_at(all_of(in_var)) %>% 
          summarise(n = n()))
    

    -output

    #[[1]]
    # A tibble: 2 x 2
    #  var       n
    #* <chr> <int>
    #1 a        20
    #2 b        20
    
    #[[2]]
    # A tibble: 2 x 2
    #    bar     n
    #* <dbl> <int>
    #1     1    20
    #2     2    20
    

    As across replaces some of the functionality of group_by_at, we can use it instead with all_of:

    lapply(foo2, function(in_var) df %>% 
          group_by(across(all_of(in_var))) %>% 
          summarise(n = n()))
    

    Or convert to symbol and evaluate (!!)

    lapply(foo2, function(in_var) df %>% 
          group_by(!! rlang::sym(in_var)) %>% 
          summarise(n = n()))
    

    Or use map

    library(purrr)
    map(foo2, ~ df %>%
                  group_by(!! rlang::sym(.x)) %>%
                  summarise(n = n()))
    

    Or instead of group_by, it can be count

    map(foo2, ~ df %>%
                  count(across(all_of(.x))))