Search code examples
rdplyrtidyeval

What's the right way to say “match anything” with dplyr::filter?


What is the proper way to use dplyr::filter programmatically to match to any value, depending on the value of the filter critereon?

For example, I want to be able to write

my_filter(df, some_var == 1, another_var == 'any')

and have this return the same result as

filter(df, some_var == 1))

That is, the special value 'any' means “don't filter on this variable at all”.

I was thinking of making a wrapper that takes the ellipsis ... and removes any arguments that have the special value, but of course that doesn't work, because of the semantics of dplyr's bare arguments and tidyeval's quosures.


Solution

  • This is the kind of things that is much more natural to do at the level of functions rather than with meta-programming. For instance, you could create your own version of == that treats "any" as a special value:

    my_equals <- function(x, y) {
      if (y == "any") {
        TRUE
      } else {
        x == y
      }
    }
    

    Then you can use it in filter():

    filter(mtcars, my_equals(cyl, "6"), my_equals(am, "1"))
    #>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
    #> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
    #> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
    #> 3 19.7   6  145 175 3.62 2.770 15.50  0  1    5    6
    
    filter(mtcars, my_equals(cyl, "6"), my_equals(am, "any"))
    #>    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    #> 1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    #> 2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    #> 3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    #> 4 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    #> 5 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
    #> 6 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    #> 7 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    

    If you really want to use == instead of a normal function, you could capture quosures and modify their environments so that == is masked by your function. Fortunately this is easy to do:

    library("rlang")
    
    quo_mask_equals <- function(quo) {
      # Create a child of the quosure environment which contains a binding
      # that masks `==` with our own function:
      env <- env(quo_get_env(quo), `==` = my_equals)
      quo_set_env(quo, env)
    }
    
    my_filter <- function(.data, ...) {
      quos <- lapply(enquos(...), quo_mask_equals)
      filter(.data, !!!quos)
    }
    
    my_filter(mtcars, cyl == "6", am == "1")
    my_filter(mtcars, cyl == "6", am == "any")
    #> *Same results as above*
    

    However I don't recommend writing or using this kind of UI because it isn't compatible with the usual R semantics for ==. I would at least use a special sentinel value rather than a value that could very well occur naturally in data:

    ANY <- function() structure(list(), class = "my_any")
    
    my_equals <- function(x, y) {
      if (inherits(x, "my_any") || inherits(y, "my_any")) {
        TRUE
      } else {
        x == y
      }
    }
    
    my_filter(mtcars, cyl == "6", am == ANY())