I need to know which of the words in a vector comes first in a string. I need to run this code on a large data frame with millions of records.
df is my sample data
df <- data.frame(ID = c(1,2,3),
Text = c("A basket of fruits having apples, green bananas, and peaches",
"A basket of fruits having green bananas, apples, and peaches",
"A basket of fruits having peaches, green bananas, and apples"))
The words I am looking to match are in a vector
vec <- c("green bananas", "apples", "peaches")
I want a result column for each record like this
df$Result
"apples", "green bananas", "peaches"
You can use regmatches
+ regexpr
like below
transform(
df,
Result = regmatches(Text, regexpr(paste0(vec, collapse = "|"), Text))
)
or str_extract
df %>%
mutate(Result = str_extract(Text, paste0(vec, collapse = "|")))
which gives
ID Text Result
1 1 A basket of fruits having apples, green bananas, and peaches apples
2 2 A basket of fruits having green bananas, apples, and peaches green bananas
3 3 A basket of fruits having peaches, green bananas, and apples peaches