Search code examples
rstringrgrepland-operator

R - Find all vector elements that contain all strings / patterns - str_detect grep


Sample data

files.in.path = c("a.4.0. name 2015 - NY.RDS", 
                  "b.4.0. name 2016 - CA.RDS", 
                  "c.4.0. name 2015 - PA.RDS")
strings.to.find = c("4.0", "PA")

I want the logical vector that shows all elements that contain all strings.to.find. The result wanted:

FALSE FALSE TRUE

This code will find elements that contain any one of the strings.to.find, i.e., uses an OR operator

str_detect(files.in.path, str_c(strings.to.find, collapse="|")) # OR operator
 TRUE TRUE TRUE

This code attempts to use an AND operator but does not work.

str_detect(files.in.path, str_c(strings.to.find, collapse="&")) # AND operator
FALSE FALSE FALSE

This works in several lines and I can write a for loop that will generate all the individual lines for cases with a larger number of strings.to.find

det.1 = str_detect(files.in.path,      "4.0"  )   
det.2 = str_detect(files.in.path,      "PA"  )   
det.all = det.1 & det.2
 FALSE FALSE  TRUE

But is there a better way that does not involve using regex that depend on the position or order of the strings.to.find.


Solution

  • A search of the web for either 'r regex "and operaror"' or 'regex "and operator"' lead to R grep: is there an AND operator?, and Regular Expressions: Is there an AND operator? respectively.

    So to match both patterns concatenate the strings together

    str <- paste0("(?=.*", strings.to.find,")", collapse="") 
    grepl(str, files.in.path, perl=TRUE)
    

    As Jota mentioned in comment by matching "4.0" this will also match other stings as the period is a metacharacter. One fix is to escape the period in your pattern string ie strings.to.find = c( "PA", "4\\.0")