I want to filter a dataframe on the values in multiple columns, without needing to hard code the columns and values inside the dplyr::filter
call. Essentially, I want to avoid this:
df_in <- data.frame(
a = c("first", "first", "first", "first", "last", "last", "last", "last"),
b = c("second", "second", "loser", "loser", "second", "second", "loser", "loser"),
c = 1:8
)
df_in
df_out <- df_in %>%
dplyr::filter(
!grepl("a", a), b == "second", c < 5 ## I want to avoid burying this in my code
)
df_out
I want to do something like this, with an imaginary prep_function
and eval_function
:
filt_crit <- prep_function(!grepl("a", a), b == "second", c < 5)
df_out <- df_in %>% dplyr::filter(eval_function(filt_crit))
df_out
I can use rlang::expr
to filter based on one criterion:
filt_crit1 <- rlang::expr(!grepl("a", a))
df_partial <- df_in %>% dplyr::filter(eval(filt_crit1))
df_partial
I've figured out a way to do this with purrr::reduce(dplyr::filter(...))
, iterating over filt_crit
:
filt_crit <- c(rlang::expr(!grepl("a", a)), rlang::expr(b == "second"), rlang::expr(c < 5))
df_out <- filt_crit %>%
purrr::reduce(\(acc, nxt) dplyr::filter(acc, eval(nxt)), .init = df_in)
df_out
This seems a bit clunky. Is purrr::reduce
the most straightforward solution? Thanks!
You can achieve your desired result by wrapping your filter conditions inside rlang::exprs
to create a list of expressions, then pass the conditons to dplyr::filter
using the unsplice operatior !!!
:
df_in <- data.frame(
a = c("first", "first", "first", "first", "last", "last", "last", "last"),
b = c("second", "second", "loser", "loser", "second", "second", "loser", "loser"),
c = 1:8
)
library(dplyr, warn = FALSE)
.filt_crit <- rlang::exprs(!grepl("a", a), b == "second", c < 5)
df_in |> filter(!!!.filt_crit)
#> a b c
#> 1 first second 1
#> 2 first second 2
.filt_crit <- rlang::exprs(!grepl("a", a), c < 5)
df_in |> filter(!!!.filt_crit)
#> a b c
#> 1 first second 1
#> 2 first second 2
#> 3 first loser 3
#> 4 first loser 4