Sparklyr Spark SQL Filter based on multiple wildcards

Using sparklyr, I'm trying to find a way to filter a Spark dataframe combining rlike and %in% functionality. Here is a minimal working example:

# start a Spark session in R and have dplyr loaded

# create a spark dataframe
df <- data.frame(names = c("Brandon", "Chee", "Brandi", "Firouz", "Eric", "Erin"),
                 place = c("Pasadena", "South Bay", "West Hollywood", "SF Valley", "South Bay", "South Bay"))
sc_df <- sdf_copy_to(sc, df, overwrite = TRUE)

# set wildcard filter paramaters
f_params <- c("Brand", "Er")

# return all rows of sc_df where the 'names' value contains either 'f_params' values.
df_filtered <- sc_df %>% 
  filter(rlike(names, f_params)) %>% 
  collect()

df_filtered in the code above obviously fails. Ideally, the df_filtered table will look like:

print(df_filtered)
# names     place           
# Brandon   Pasadena      
# Brandi    West Hollywood
# Eric      South Bay     
# Erin      South Bay

Additional rule: Because the real example includes about 200 values in f_params, I can't use the following solution:

df_filtered <- sc_df %>% 
  filter(rlike(names, "Brand") | rlike(names, "Er")) %>% 
  collect()

Thanks, in advance.

Solution

I can't use multiple rlike() statements separated with | (OR) because the real example includes about 200 values in f_params

That sounds like a rather artificial constraint, but if your really want to avoid a single regular expression you can always compose an explicit disjunction:

library(rlang)

sc_df %>% 
  filter(!!rlang::parse_quo(glue::glue_collapse(glue::glue(
    "(names %rlike% '{f_params}')"), 
    " %or% "  # or " | "
  ), rlang::caller_env()))

# Source: spark<?> [?? x 2]
  names   place         
  <chr>   <chr>         
1 Brandon Pasadena      
2 Brandi  West Hollywood
3 Eric    South Bay     
4 Erin    South Bay

If f_params are guaranteed to be valid regexp literals it should be much faster to simply concatenate the string:

sc_df %>% 
  filter(names %rlike% glue::glue_collapse(glue::glue("{f_params}"), "|"))

# Source: spark<?> [?? x 2]
  names   place         
  <chr>   <chr>         
1 Brandon Pasadena      
2 Brandi  West Hollywood
3 Eric    South Bay     
4 Erin    South Bay

If not, you can try to apply Hmisc::escapeRegexp first:

sc_df %>% 
  filter(
    names %rlike% glue::glue_collapse(glue::glue(
      "{Hmisc::escapeRegex(f_params)}"
    ), "|")
  )

but please keep in mind that Spark uses Java regular expression, so it might not cover some edge cases.