I have a huge data frame. One of the columns in the data frame is an email address.
In addition, I have a vector with domain extensions (for example: c(".ac",".ad",".ae",".af",".ag",".ai")
- a total length of 259 extensions.)
I want to filter my data frame to contain records whose email ends with one of the strings in the extensions list.
I tried several options, but none of them produced the desired result.
df %>%
filter(endsWith(email, extensions))
df %>%
filter(stringr::str_ends(email, extensions))
You can use the regular expression for pattern matching:
ext <- c("ac","ad","ae","af","ag","ai")
df %>%
filter(grepl(sprintf("\\.(%s)$", paste(ext, collapse = '|')), email))
where the sprintf
part creates a legitimate regex
syntax like
"\\.(ac|ad|ae|af|ag|ai)$"