I would like to find if any of the words from columns 1,2 or 3 exist in column 4 - if yes return 1 if no return 0. For example we will have l5 l6 in V1, l5 in V2 and l8 l45 in V3. V4 is l6 and l56 so the new column V5 will return 1 because there is a match - second word from V1 is l6 - new column will return 1.
Another example from below table - row 6 will return 1 into new column because it contains word l7 in V4 which exists in column V3. - I would like to always compare V4 against other columns for output. Thank you.
id V1 V2 V3 V4
1 l7 l7 l7 l7
2 l31 l7 l56 l7 l35 l22
3 l31 l7 l7 l31 l7 l7
4 l22 l16 l22 l28
5 l31 l31 l32 l31
6 l18 l48 l18 l7 l22 l28,l36,l7
7 l31 l31 l31 l32
8 l31 l10 l39 l31,l7
9 l7 l35 l31 l7 l35 l31
10 l36 l36 l36 l7 l36
We can split the columns with strsplit
, concatenate with the elements using Map
and then compare with the 5th column
as.integer(mapply(function(x, y) any(x %in% y),
strsplit(df1[,5], '[, ]'),
do.call(Map, c(f = c, lapply(df1[2:4], function(x) strsplit(x, "[, ]"))))))
#[1] 1 0 1 0 1 1 0 1 1 1
Or more compactly, paste
the rows (columns 2 to 4) together then do the strsplit
and compare
as.integer(sapply(Map(`%in%`, strsplit(df1[,5], '[, ]+'),
lapply(strsplit(do.call(paste, df1[2:4]), "[ ,]+"), unique)), any))
#[1] 1 0 1 0 1 1 0 1 1 1