Filter Dataframe with Tidy Evaluation

I am handling a large dataset. First, for certain columns (X1, X2, ...), I am trying to identify a range of value (a, b) consists of repeated value (a > n, b > n). Next, I wish to filter row based on the condition which matches respective columns to result given in the previous step.

Here is a reproducible example simulating the scenario I am facing,

library(tidyverse)

set.seed(1122)

vecs <- lapply(X = 1:2, function(x) rep(c(1, 2, 3), times = 10) %>% sample() %>% head(10))
names(vecs) <- paste0("col_", 1:2)
dat <- vecs %>% as.data.frame()
dat

   col_1 col_2
1      3     2
2      1     1
3      1     1
4      1     2
5      1     2
6      3     3
7      3     3
8      2     1
9      1     3
10     2     2

I am able to identify the range by the following method,

# Which col has repeated value more than 3 appearances?
more_than_3 <- function(df, var){
    var <- rlang::sym(var)

    df %>% 
        group_by(!!var) %>% 
        summarise(n = n()) %>% 
        filter(n > 3) %>% 
        pull(!!var) %>% 
        range()
}
cols_name <- c("col_1", "col_2")
some_range <- purrr::map(cols_name, more_than_3, df = dat)
names(some_range) <- cols_name
some_range

$col_1
[1] 1 1

$col_2
[1] 2 2

However, to filter out values that fall outside the upper limit, this is what I do.

dat %>% 
    filter(col_1 <= some_range[["col_1"]][2], 
           col_2 <= some_range[["col_2"]][2])

  col_1 col_2
1     1     1
2     1     1
3     1     2
4     1     2

I believe there must be a more efficient and elegant way of filtering the result based on tidy evaluation. Can someone point me to the right direction?

Many thanks in advance.

Solution

First let's try to create a small function that creates a single filter expression for one column. This function will take a symbol and then transform to string but it could be the other way around:

new_my_filter_call_upper <- function(sym, range) {
  col_name <- as.character(sym)

  col_range <- range[[col_name]]
  if (is.null(col_range)) {
    stop(sprintf("Can't find column `%s` to compute range", col_name), call. = FALSE)
  }

  expr(!!sym < !!col_range[[2]])
}

Let's try it:

new_my_filter_call_upper(quote(foobar), some_range)
#> Error: Can't find column `foobar` to compute range

# It works!
new_my_filter_call_upper(quote(col_1), some_range)
#> col_1 < 3

Now we're ready to create a pipeline verbs that will take a data frame and bare column names.

# Probably cleaner to pass range as argument. Prefix with dot to allow
# columns named `range`.
my_filter <- function(.data, ..., .range) {
  # ensyms() guarantees there won't be complex expressions
  syms <- rlang::ensyms(...)

  # Now let's map the function to create many calls:
  calls <- purrr::map(syms, new_my_filter_call_upper, range = .range)

  # And we're ready to filter with those expressions:
  dplyr::filter(.data, !!!calls)
}

Let's try it:

dat %>% my_filter(col_1, col_2, .range = some_range)
#>   col_1 col_2 NA.
#> 1     2     1   1
#> 2     2     2   1