Search code examples
racross

how to use str_detect within across when searching multiple columns for several search strings


I'm looking to migrating my functions to the newly minted across.

A function I have searches several key words across several columns using filter_at.

However, I am struggling to replicate this using across as shown below:

library(tidyverse)

raw_df <- tibble::tribble(
  ~cust_name, ~other_desc, ~trans, ~val,
     "Cisco",   "nothing",    "a", 100L,
    "bad_cs",     "cisCo",    "s", 101L,
       "Ibm",   "nothing",    "d", 102L,
    "bad_ib",       "ibM",    "f", 102L,
    "oraCle",    "Oracle",    "g", 103L,
      "mSft",   "nothing",    "k", 103L,
      "noth",      "Msft",    "j", 104L,
      "noth",    "oracle",    "l", 104L
  )


search_string = c("ibm", "cisco")


# Done using `filter_at`
raw_df %>% 
  filter_at(.vars = vars(cust_name, other_desc),
            .vars_predicate = any_vars(str_detect(., regex(paste(search_string, collapse = "|"), ignore_case = TRUE)))
            
  ) %>% unique()
  
  
# Not able to replicate result with `across`
raw_df %>% 
  filter(across(
    .cols = c(cust_name, other_desc), 
    .fns = ~str_detect(.), regex(paste(search_string, collapse = "|"), ignore_case = TRUE)))



raw_df %>% 
  filter(str_detect,
         across(any_of(cust_name, other_desc),
         regex(paste(search_string, collapse = "|"), ignore_case = TRUE)))

Solution

  • Combine across with Reduce to select rows which has any occurrence of the pattern.

    library(dplyr)
    library(stringr)
    
    pat <- paste(search_string, collapse = "|")
    
    raw_df %>% 
      filter(Reduce(`|`, across(c(cust_name, other_desc), 
            ~str_detect(., regex(pat, ignore_case = TRUE)))))
    

    However, I think using if_any is more suitable here as it was build to handle such cases -

    raw_df %>%
      filter(if_any(c(cust_name, other_desc), 
                    ~str_detect(., regex(pat, ignore_case = TRUE))))
    
    # cust_name other_desc trans   val
    #  <chr>     <chr>      <chr> <int>
    #1 Cisco     nothing    a       100
    #2 bad_cs    cisCo      s       101
    #3 Ibm       nothing    d       102
    #4 bad_ib    ibM        f       102