Search code examples
rregexdataframevariable-length

Apply regular expressions to compare values in data frames of different length in R


I am trying to apply a regular expression to match values in two data frames of different length in R. My objective is to retain only the values that match the regex in both data frames.

An example of the dataset would be:

a<-c('item1','item2','item4')
b<-c('item1','\t item2','item3','item4')

I tried to do grepl(a$. , b$.) but it only works for the 'first' row. Just to explain, the values in the two columns have a common kernel name but there might be small differences, so I do need some kind of regex.

If the code worked, I would get that the new object 'c', which could also be a filtered version of a, would be equal to

c<-c('item1','item2','item4')

Peace to you


Solution

  • We could paste the elements of 'a' to a single expression pattern and use that in grep

    grep(paste(a, collapse = "|"), b, value = TRUE)