I have two vectors in R:
list1 <- c("ABCDEF", "FEDCBA", "AA-BB-CCCC", "ABCDEFGH-IJK", "ZZZZ")
list2 <- c("ABCDEF", "FEDCBA:XA",
"AA-BB-CCCC-01","AA-BB-CCCC-21:ABC", "ABCDEFGH-IJK-1X",
"AKDWXFE-XXY")
I'd like to compare the two lists -- with list1 being the 'correct' list. If an item in list1 does not appear in list2, then print out 'Add [item in list1]'; if item in list2 is not in list1, then print out 'delete [item in list 2]'. I would like to find partial matches. For example, list 1 has 'FEDCBA' and list2 has 'FEDCBA:XA" -- this would be an acceptable partial match....same with list 2 having AA-BB-CCCC-21:ABC while list1 has AA-BB-CCCC (this is also an acceptable partial match).
It looks like a homework to me, but OK, let us make it a teaching moment.
First, let us find out which elements of list1 have matches in list2. We will use grepl for that, which returns a logical vector with one TRUE/FALSE value for each element of list2.
library(tidyverse)
list1_has_match <- map_lgl(list1, ~ any(grepl(., list2)))
msg <- sprintf("Add [%s]", list1[ !list1_has_match ])
In the above code, I use map_lgl
to run the any(grepl(...))
expression for each element of list1 and return a logical vector. Any element that has a FALSE
value in that vector is not present in list2 and should be added.
Next, we do the same – the other way around. However, we have still to use the elements of list1 as a pattern. This is why the next point gets a bit complicated. In each call within map_dfr, we are generating a named vector corresponding to one element of list1. However, since we use map_dfr
, each of these vectors will be considered a row in a data frame. Thus, the columns of the result will correspond to the elemnts of list2.
map1 <- map_dfr(list1, ~ set_names(grepl(., list2), list2))
list2_has_match <- map_lgl(map1, any)
msg <- c(msg,
sprintf("delete [%s]", list2[ !list2_has_match ]))
And now print the messages
cat(msg, sep="\n")