I have a list of word stems, e.g.:
stems <- c("fri", "odd", "inspi")
I want to see if a word starts with any of those stems, and then return that stem. For example, "fright"
begins with "fri"
, so I want to return "fri"
.
On the other hand, while "todd"
contains "odd"
, it does not start with "odd"
, and so I wouldn't want to return anything.
Is there a way to accomplish this? I've tried str_starts()
where the pattern argument is a list, but that doesn't seem to work.
As a minimal example, if my data looks like:
dat <- tibble(complete_word = c("fright", "todd", "quirky", "oddly"))
I would want to return:
dat <- tibble(complete_word = c("fright", "todd", "quirky", "oddly"),
stem <- c("fri", NA, NA, "odd"))
Here's one way with tidyverse. Use map
with str_starts
to get the matching index, if any, from vector stem
.
library(dplyr)
library(purrr)
dat %>%
mutate(idx = map(complete_word, ~ which(str_starts(.x, stems) == 1)),
stem = stems[as.integer(idx)])
Result:
# A tibble: 4 × 3
complete_word idx stem
<chr> <list> <chr>
1 fright <int [1]> fri
2 todd <int [0]> NA
3 quirky <int [0]> NA
4 oddly <int [1]> odd