I have 2 set of strings. Char and Char2 for this example. I am trying to find if Char includes at least 2 words from Char2 (any two words can match). I have yet to get to the "at least 2 words" part, but I must first figure out the matching of any word in each string. Any help would be greatly appreciated.
I have tried using the stringr package a couple of different ways. Please see below. I tried using similar solutions to what Robert answered with in this question: Detect multiple strings with dplyr and stringr
shopping_list <- as.data.frame(c("good apples", "bag of apples", "bag of sugar", "milk x2"))
colnames(shopping_list) <- "Char"
shopping_list2 <- as.data.frame(c("good pears", "bag of sugar", "bag of flour", "sour milk x2"))
colnames(shopping_list2) <- "Char2"
shop = cbind(shopping_list , shopping_list2)
shop$Char = as.character(shop$Char)
shop$Char2 = as.character(shop$Char2)
# First attempt
sapply(shop$Char, function(x) any(sapply(shop$Char2, str_detect, string = x)))
# Second attempt
str_detect(shop$Char, paste(shop$Char2, collapse = '|'))
I get these results:
sapply(shop$Char, function(x) any(sapply(shop$Char2, str_detect, string = x)))
good apples bag of apples bag of sugar milk x2
FALSE FALSE TRUE FALSE
str_detect(shop$Char, paste(shop$Char2, collapse = '|'))
FALSE FALSE TRUE FALSE
However I am looking for these results:
FALSE TRUE TRUE TRUE
1) FALSE because only 1 word matches 2) TRUE because "bag of" in both 3) TRUE because "bag of" in both 4) TRUE because "milk x2" in both
Here is a function that could help
match_test <- function (string1, string2) {
words1 <- unlist(strsplit(string1, ' '))
words2 <- unlist(strsplit(string2, ' '))
common_words <- intersect(words1, words2)
length(common_words) > 1
}
Here is an example
string1 <- c("good apples" , "bag of apples", "bag of sugar", "milk x2")
string2 <- c("good pears" , "bag of sugar", "bag of flour", "sour milk x2")
vapply(seq_along(string1), function (k) match_test(string1[k], string2[k]), logical(1))
# [1] FALSE TRUE TRUE TRUE