Find the intersection between strings in 2 columns

I am trying to find the common words between 2 columns for each row in a data frame. For example my input is:

C1                | C2
Roy goes to Japan | Roy goes to Australia 
I go to Japan     | You go to Japan

And I need a column appended as

C1                | C2                    | Result
Roy goes to Japan | Roy goes to Australia | Roy goes to
I go to Japan     | He goes to Japan      | to Japan

I tried intersect but it gives me intersection between C1 and C2, and not each element of C1 and C2. I guess I'll have to use something from stringr or stringi but not sure what. Also, my dataset is huge so something fast would be nice.

Solution

You could split the string on whitespace and then use intersect to find the common words.

df$result <- mapply(function(x, y) paste0(intersect(x, y), collapse = " "),
                    strsplit(df$C1, '\\s'), strsplit(df$C2, '\\s'))
df
#                 C1                    C2      result
#1 Roy goes to Japan Roy goes to Australia Roy goes to
#2     I go to Japan      He goes to Japan    to Japan

You could also do this with tidyverse :

library(tidyverse)
df %>%
  mutate(result = map2_chr(str_split(C1, '\\s'), str_split(C2, '\\s'), 
                           ~str_c(intersect(.x, .y), collapse = " ")))

data

df <- structure(list(C1 = c("Roy goes to Japan", "I go to Japan"), 
    C2 = c("Roy goes to Australia", "He goes to Japan")), row.names = c(NA, 
-2L), class = "data.frame")