Search code examples
rstring

Check if string starts with any string in list of strings


I have a list of word stems, e.g.:

stems <- c("fri", "odd", "inspi")

I want to see if a word starts with any of those stems, and then return that stem. For example, "fright" begins with "fri", so I want to return "fri".

On the other hand, while "todd" contains "odd", it does not start with "odd", and so I wouldn't want to return anything.

Is there a way to accomplish this? I've tried str_starts() where the pattern argument is a list, but that doesn't seem to work.

Duplicates are not an issue in my data.

As a minimal example, if my data looks like:

dat <- tibble(complete_word = c("fright", "todd", "quirky", "oddly"))

I would want to return:

dat <- tibble(complete_word = c("fright", "todd", "quirky", "oddly"),
stem <- c("fri", NA, NA, "odd"))

Solution

  • Here's one way with tidyverse. Use map with str_starts to get the matching index, if any, from vector stem.

    library(dplyr)
    library(purrr)
    
    dat %>% 
      mutate(idx = map(complete_word, ~ which(str_starts(.x, stems) == 1)), 
             stem = stems[as.integer(idx)])
    

    Result:

    # A tibble: 4 × 3
      complete_word idx       stem 
      <chr>         <list>    <chr>
    1 fright        <int [1]> fri  
    2 todd          <int [0]> NA   
    3 quirky        <int [0]> NA   
    4 oddly         <int [1]> odd