Search code examples
rregextidyversestringr

Regex for at least one instance of each of a list of letters?


I'm trying to sharpen my skills with regular expressions by coming up with some R code to solve the NY Time's Spelling Bee game.

I've done that, but now I'm going one step further and trying to identify specifically what the game calls "pangrams"--words that contain at least one instance of each of set of seven letters.

I was hoping to do this with str_detect() and a regex, but I'm not seeing a way to say "at least one of each of these letters."

Per the second example here, the function can be used over a list of letters, but I'm running into problems when the string I want to compare against is in a tibble with a list of words.

This does not work (to identify "pedagogy" as the pangram):

library(tidyverse)

required_letters <- c("o", "a", "d", "e", "g", "p", "y")
list_of_words <- tibble(word = c("pedagogy", "agog", "apogee", "dodge"))

pangrams <- list_of_words %>%
  filter(all(str_detect(word, required_letters)))

But I was hoping it would work in the way that this does:

all(str_detect("pedagogy", required_letters))

Solution

  • In regex, you can create a pattern using look ahead for each letter:

    pattern <- str_c("(?=.*", required_letters, ")", collapse = "") 
    
    list_of_words %>%
       filter(str_detect(word, pattern))
    
    # A tibble: 1 × 1
      word    
      <chr>   
    1 pedagogy