Search code examples
rregexstringpattern-matchingregex-lookarounds

Pattern matching in R if string NOT followed but another string


I am trying to match the following in R using str_detect from the stringr package.

I want to to detect if a given string if followed or preceeded by 'and' or '&'. For example, in:

string_1<-"A and B"
string_2<-"A B"
string_3<-"B and A"
string_4<-"A B and C"

I want str_detect(string_X) to be FALSE for string_1, string_3 and string_4 but TRUE for string_2.

I have tried:


str_detect(string_X,paste0(".*(?<!and |& )","A"))==TRUE & str_detect(string_X,paste0(".*","A","(?! and| &).*"))==TRUE)


I use paste0 because I want to run this over different strings. This works all the cases above except 4. I am new to regex, and it also does not seem very elegant. Is there a more general solution?

Thank you.


Solution

  • You can use a positive lookahead assertion to make sure that there is no A or B present followed by and or & and also not in the other order.

    ^(?!.*[AB] (?:and|&))(?!.*(?:and|&) [AB])
    
    • ^ Start of string
    • (?!.*[AB] (?:and|&)) Assert that the string does not contain A or B followed by either and or &
    • (?!.*(?:and|&) [AB]) Assert that the string does not contain either and or & followed by either A or B

    Regex demo | R demo

    library(stringr)
    
    string_1<-"A and B"
    string_2<-"A B"
    string_3<-"B and A"
    string_4<-"A B and C"
    string_5<-"& B"
    
    strings <- c(string_1, string_2, string_3, string_4, string_5)
    
    str_detect(strings, "^(?!.*[AB] (?:and|&))(?!.*(?:and|&) [AB])")
    

    Output

    [1] FALSE  TRUE FALSE FALSE FALSE