As part of my answer to this post, I suggested a completely generic mechanism by which one data frame could be filtered by conditions stored in another. The OP has called me out (damn!) and asked me for an implementation.
My solution requires me to store functions in the filter dataframe. This is possible: this post shows how.
As a basic example, consider
library(tidyverse)
longFilterTable <- tribble(
~var, ~value,
"gear", list(3),
) %>%
mutate(
func=pmap(
list(value),
~function(x) x == ..1[[1]]
)
)
longFilterTable
# A tibble: 1 x 3
var value func
<chr> <list> <list>
1 gear <list [1]> <fn>
This is a very convoluted way of saying "select only those rows (of mtcars
) for which gear
is 3
. This works:
mtcars %>% filter(longFilterTable$func[[1]](gear)) %>% head(3)
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
<11 rows deleted for brevity>
Now suppose I want more flexibility in the criterion. I might, for example, want to select either a range of values or a fixed value. This seems to be a reasonable extension of the filter dataset above:
longFilterTable <- tribble(
~var, ~value, ~condition,
"gear", list(3), "equal",
"wt", list(3,4, 3.9), "range",
) %>%
mutate(
func=pmap(
list(value, condition),
~function(x) {
case_when(
condition == "equal" ~ x == ..1[[1]],
condition == "range" ~ x >= ..1[[1]][1] & x <= ..1[[1]][2],
TRUE ~ x
)
}
)
)
longFilterTable
# A tibble: 2 x 4
var value condition func
<chr> <list> <chr> <list>
1 gear <list [1]> equal <fn>
2 wt <list [3]> range <fn>
But now when I try to apply the filter, I get:
mtcars %>% filter(longFilterTable$func[[1]](gear))
Show Traceback
Rerun with Debug
Error: Problem with `filter()` input `..1`.
x Obsolete data mask.
x Too late to resolve `condition` after the end of `dplyr::mutate()`.
ℹ Did you save an object that uses `condition` lazily in a column in the `dplyr::mutate()` expression ?
ℹ Input `..1` is `longFilterTable$func[[1]](gear)`.
I've played around with various combinations of deparse()
, substitute()
, expression()
, force()
and eval()
, but to no avail. Can anyone find a solution?
Your problem is that all options of case_when
are always evaluated and checked for correct output format
x <- 1
dplyr::case_when(x < 2 ~ TRUE,
x < 0 ~ FALSE)
#> [1] TRUE
dplyr::case_when(x < 2 ~ TRUE,
x < 0 ~ stop())
#> Error in eval_tidy(pair$rhs, env = default_env):
In your case, you want to use the first option, checking for equality. However, the range condition is also evaluated yet no second value is stored in the value
list, the outcome is an vector of NA
s only, hence the error. Switching from case_when
to a regular if else clause solves this issue.
library(purrr)
library(dplyr)
longFilterTable <- tribble(
~var, ~value, ~condition,
"gear", list(3), "equal",
"wt", list(3.4, 3.9), "range",
) %>%
mutate(
func=pmap(
list(value, condition),
~function(x) {
if(..2 == "equal") x == ..1[[1]]
else if (..2 == "range") x >= ..1[[1]] & x <= ..1[[2]]
else TRUE
}
)
)
mtcars %>% filter(longFilterTable$func[[2]](drat))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1