Search code examples
rdataframetidyversetibble

Why should I use `all_of` to select columns?


I'm currently using R and came across the function all_of in the tidyverse. What does this function exists for? It seems like I can use just x at every point where all_of(x) can be used..

Example:

library(tidyverse)

tb <- tibble(a=1:3, b=1:3, c=1:3)
x <- c("a", "b")

tb %>% select(all_of(x))
tb %>% select(x)

tb %>$ select(-all_of(x))
tb %>% select(-x)

The two lines with all_of yield the same return values as the ones without the extra function. Why should I bother using them?


Solution

  • This is a really nice question!

    It is to make it clear about what you really want when selecting columns. Imagine this simple situation:

    library(tidyverse)
    
    tb <- tibble(x = 1:3, y = 1:3, z = 1:3)
    x <- c("x", "y")
    
    tb %>% select(x)
    

    Do you see that? It is not clear whether you want x as external vector here and thus select two columns (x and y), or if you want to select only one column x.

    That's why you should use all_of(), which says that you want to select column names from an external vector.

    More information can be found in tidyselect docs.