Search code examples
rdplyrtidyversestringrgrepl

Filtering a tibble in R by list of strings and returning all records that end with the strings in the list


I have a huge data frame. One of the columns in the data frame is an email address. In addition, I have a vector with domain extensions (for example: c(".ac",".ad",".ae",".af",".ag",".ai") - a total length of 259 extensions.) I want to filter my data frame to contain records whose email ends with one of the strings in the extensions list.

I tried several options, but none of them produced the desired result.

df %>% 
  filter(endsWith(email, extensions)) 
df %>% 
  filter(stringr::str_ends(email, extensions)) 

Solution

  • You can use the regular expression for pattern matching:

    ext <- c("ac","ad","ae","af","ag","ai")
    
    df %>% 
      filter(grepl(sprintf("\\.(%s)$", paste(ext, collapse = '|')), email))
    

    where the sprintf part creates a legitimate regex syntax like

    "\\.(ac|ad|ae|af|ag|ai)$"