I am trying to match the following in R using str_detect from the stringr package.
I want to to detect if a given string if followed or preceeded by 'and' or '&'. For example, in:
string_1<-"A and B"
string_2<-"A B"
string_3<-"B and A"
string_4<-"A B and C"
I want str_detect(string_X) to be FALSE for string_1, string_3 and string_4 but TRUE for string_2.
I have tried:
str_detect(string_X,paste0(".*(?<!and |& )","A"))==TRUE & str_detect(string_X,paste0(".*","A","(?! and| &).*"))==TRUE)
I use paste0 because I want to run this over different strings. This works all the cases above except 4. I am new to regex, and it also does not seem very elegant. Is there a more general solution?
Thank you.
You can use a positive lookahead assertion to make sure that there is no A
or B
present followed by and
or &
and also not in the other order.
^(?!.*[AB] (?:and|&))(?!.*(?:and|&) [AB])
^
Start of string(?!.*[AB] (?:and|&))
Assert that the string does not contain A
or B
followed by either and or &(?!.*(?:and|&) [AB])
Assert that the string does not contain either and or & followed by either A or Blibrary(stringr)
string_1<-"A and B"
string_2<-"A B"
string_3<-"B and A"
string_4<-"A B and C"
string_5<-"& B"
strings <- c(string_1, string_2, string_3, string_4, string_5)
str_detect(strings, "^(?!.*[AB] (?:and|&))(?!.*(?:and|&) [AB])")
Output
[1] FALSE TRUE FALSE FALSE FALSE