Search code examples
rdplyrtidyverselazy-evaluation

Obsolete data mask. Too late to resolve `xxxxxx` after the end of `dplyr::mutate()`


As part of my answer to this post, I suggested a completely generic mechanism by which one data frame could be filtered by conditions stored in another. The OP has called me out (damn!) and asked me for an implementation.

My solution requires me to store functions in the filter dataframe. This is possible: this post shows how.

As a basic example, consider

library(tidyverse)

longFilterTable <- tribble(
  ~var,   ~value,
  "gear", list(3),
) %>% 
  mutate(
    func=pmap(
      list(value),
      ~function(x) x == ..1[[1]]
    )
  )

longFilterTable
# A tibble: 1 x 3
  var   value      func  
  <chr> <list>     <list>
1 gear  <list [1]> <fn>  

This is a very convoluted way of saying "select only those rows (of mtcars) for which gear is 3. This works:

mtcars %>% filter(longFilterTable$func[[1]](gear)) %>% head(3)
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
<11 rows deleted for brevity>

Now suppose I want more flexibility in the criterion. I might, for example, want to select either a range of values or a fixed value. This seems to be a reasonable extension of the filter dataset above:

longFilterTable <- tribble(
  ~var,   ~value,         ~condition,
  "gear", list(3),        "equal",
  "wt",   list(3,4, 3.9), "range",
) %>% 
  mutate(
    func=pmap(
      list(value, condition),
      ~function(x) {
        case_when(
          condition == "equal" ~ x == ..1[[1]],
          condition == "range" ~ x >= ..1[[1]][1] & x <= ..1[[1]][2],
          TRUE ~ x
        )
      }
    )
  )

longFilterTable
# A tibble: 2 x 4
  var   value      condition func  
  <chr> <list>     <chr>     <list>
1 gear  <list [1]> equal     <fn>  
2 wt    <list [3]> range     <fn>  

But now when I try to apply the filter, I get:

mtcars %>% filter(longFilterTable$func[[1]](gear))
 Show Traceback
 
 Rerun with Debug
 Error: Problem with `filter()` input `..1`.
x Obsolete data mask.
x Too late to resolve `condition` after the end of `dplyr::mutate()`.
ℹ Did you save an object that uses `condition` lazily in a column in the `dplyr::mutate()` expression ?
ℹ Input `..1` is `longFilterTable$func[[1]](gear)`.

I've played around with various combinations of deparse(), substitute(), expression(), force() and eval(), but to no avail. Can anyone find a solution?


Solution

  • Your problem is that all options of case_when are always evaluated and checked for correct output format

    x <- 1
    
    dplyr::case_when(x < 2 ~ TRUE,
                     x < 0 ~ FALSE)
    #> [1] TRUE
    
    dplyr::case_when(x < 2 ~ TRUE,
                     x < 0 ~ stop())
    #> Error in eval_tidy(pair$rhs, env = default_env):
    

    In your case, you want to use the first option, checking for equality. However, the range condition is also evaluated yet no second value is stored in the value list, the outcome is an vector of NAs only, hence the error. Switching from case_when to a regular if else clause solves this issue.

    library(purrr)
    library(dplyr)
    longFilterTable <- tribble(
      ~var,   ~value,         ~condition,
      "gear", list(3),        "equal",
      "wt",   list(3.4, 3.9), "range",
    ) %>% 
      mutate(
        func=pmap(
          list(value, condition),
          ~function(x) {
            if(..2 == "equal") x == ..1[[1]]
            else if (..2 == "range") x >= ..1[[1]] & x <= ..1[[2]]
            else TRUE
          }
        )
      )
    
    
    mtcars %>% filter(longFilterTable$func[[2]](drat))
    #>                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    #> Mazda RX4     21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    #> Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    #> Datsun 710    22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    #> Merc 240D     24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    #> Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1