Having trouble getting my head around this one, but I sense the answer uses stringr::str_subset
.
Here's an example of what I'm string to achieve:
word_list <- c("amber", "flora", "glide", "quake", "slant")
word_neg <- "aside"
word_list_pruned <- some_function(word_list, word_neg)
> word_list_pruned
> c("flora", "slant")
I want to take a list of words, word_list
, and a word, word_neg
(here, "aside"), and I want to remove all words in word_list
that have letters that match/are in the same place as in word_neg
.
Any ideas?
One option would be to use a regex approach. Given the negative word aside
, we can build the following regex alternation:
^(?:a....|.s...|..i..|....d.|....e)$
Any word which does not match this alternation should be retained as a match.
word_list <- c("amber", "flora", "glide", "quake", "slant")
word_neg <- "aside"
patterns <- sapply(seq_along(1:5), function(x) {
paste0(strrep(".", x - 1), substr(word_neg, x, x), strrep(".", nchar(word_neg) - x))
})
pattern <- paste0("^(?:", paste(patterns, collapse="|"), ")$")
word_list_pruned <- word_list[!grepl(pattern, word_list)]
word_list_pruned
[1] "flora" "slant"
The complex string manipulation inside the call to sapply()
is generating the regex alternation. We simply start off with .....
, and then add back one letter from the negative input word.