Search code examples
rdataframematchwords

Does word exist in other column?


I would like to find if any of the words from columns 1,2 or 3 exist in column 4 - if yes return 1 if no return 0. For example we will have l5 l6 in V1, l5 in V2 and l8 l45 in V3. V4 is l6 and l56 so the new column V5 will return 1 because there is a match - second word from V1 is l6 - new column will return 1.

Another example from below table - row 6 will return 1 into new column because it contains word l7 in V4 which exists in column V3. - I would like to always compare V4 against other columns for output. Thank you.

   id                 V1      V2         V3                 V4
   1                  l7      l7         l7                 l7
   2              l31 l7 l56  l7         l35                l22
   3              l31 l7      l7       l31 l7               l7
   4                 l22      l16        l22                l28
   5                 l31      l31        l32                l31
   6             l18 l48      l18      l7 l22         l28,l36,l7
   7                 l31      l31        l31                l32
   8                 l31      l10        l39             l31,l7
   9              l7 l35      l31      l7 l35               l31
   10                l36      l36      l36 l7               l36

Solution

  • We can split the columns with strsplit, concatenate with the elements using Map and then compare with the 5th column

    as.integer(mapply(function(x, y) any(x %in% y), 
                   strsplit(df1[,5], '[, ]'), 
        do.call(Map, c(f = c, lapply(df1[2:4], function(x) strsplit(x, "[, ]"))))))
    #[1] 1 0 1 0 1 1 0 1 1 1
    

    Or more compactly, paste the rows (columns 2 to 4) together then do the strsplit and compare

    as.integer(sapply(Map(`%in%`, strsplit(df1[,5], '[, ]+'), 
        lapply(strsplit(do.call(paste, df1[2:4]), "[ ,]+"), unique)), any))
    #[1] 1 0 1 0 1 1 0 1 1 1