Search code examples
rdplyr

Alternative to `any_of()` that works with NSE?


In a function, I sometimes need to select a variable only if it exists.

For this, the function dplyr::any_of() is perfect, but it only works with standard evaluation, taking a character vector as input.

I'm looking for an alternative that would work as a replacement in the following example that feels very hacky:

library(tidyverse) 
library(rlang)
f = function(data, x1, x2, gp){
  gpname =  as_label(enquo(gp))
  data %>%
    select(x1={{x1}}, x2={{x2}}, gp=any_of(gpname)) %>% 
    names()
}

iris %>% f(Sepal.Length,Sepal.Width,Species)
#> [1] "x1" "x2" "gp"
iris %>% f(Sepal.Length,Sepal.Width)
#> [1] "x1" "x2"

Created on 2024-03-15 with reprex v2.1.0

The function should run with or without Species, as in my reprex, but it would make sense that it throws an error if querying an unknown column (unlike in my reprex)


Solution

  • Given your last statement about wanting errors for requesting an absent variable, it seems its not really an any_of type situation; erroring when asking for impossible columns is commonplace, it looks like you simply want an additional column name to be itself optionally passed in, i.e. a skipabble param.

    This is achievable by simply passing default null for the 3rd position. ie.

    library(tidyverse) 
    library(rlang)
    f2 = function(data, x1, x2,x3=NULL){
      data |>
        select({{x1}},
               {{x2}},
               {{x3}}) |> 
        names()
    }
    
    iris |> f2(Sepal.Length,Sepal.Width,Species)
    iris |> f2(Sepal.Length,Sepal.Width)
    iris |> f2(Sepal.Length,Sepal.Width,abc)
    

    we could generalise and take any number of positions and try to get them with dots.

    library(tidyverse) 
    library(rlang)
    f = function(data, x1, x2,...){
      data |>
        select({{x1}},
               {{x2}},
               !!!rlang::enquos(...)) |> 
        names()
    }
    
    
    #runs fine with a variable that is present
    iris |> f(Sepal.Length,Sepal.Width,Species)
    #  "Sepal.Length" "Sepal.Width"  "Species" 
    
    #its optional and can be left out 
    iris |> f(Sepal.Length,Sepal.Width)
    #  "Sepal.Length" "Sepal.Width"  
    
    # trying an absent variable correctly errors
    iris |> f(Sepal.Length,Sepal.Width,SpeciesX)
    # Error in `select()`:
    #   ! Can't select columns that don't exist.
    # ✖ Column `SpeciesX` doesn't exist.
    # Run `rlang::last_trace()` to see where the error occurred.