I'm new to text analysis. I have been struggling with a particular problem in R this past week. I am trying to figure out how to remove or replace all variations of a word in a string. For example, if the string is:
test <- c("development", "develop", "developing", "developer", "apples", "kiwi")
I want the end output to be:
"apples", "kiwi"
So, basically, I'm trying to figure out how to remove or replace all words beginning with "^develop". I have tried using str_remove_all in the stringr package using this expression:
str_remove_all(test, "^dev")
But the end result was this:
"elopment", "elop", "eloping", "eloper", "apples", "kiwi"
It only removed parts of the word that matched the beginning expression "dev", whereas I want to remove the entire word if it matches the beginning of "dev".
Thanks!
Use grep with invert:
grep("^develop", test, invert = TRUE, value = TRUE)
## [1] "apples" "kiwi"
or negate grepl:
ok <- !grepl("^develop", test)
test[ok]
or remove develop
and then retrieve those elements that have not changed:
test[sub("^develop", "", test) == test]