Search code examples
regexbooleanquery

Regex for matching incorrect AND OR logic


I need a regex that can match on an incorrect AND / OR logic statements but not if they are in quotes for example:

No matches should be found in:

MAR AND SATURN
MAR OR SATURN
"MAR AND SATURN"

There won't be any matches if AND or OR have at least 1 white space character plus 1 non-white space character on both sides and the characters are not made up of OR or AND. So for example ..R AND S.. should not match but (OR) OR (OR) or (AND) AND (AND) should.

Matches

  MARS AND SATURN [AND]
  MARS [OR]
  MARS [ OR ]
  [AND] AND [AND]
  [OR] [AND]
  [OR] [AND]
  [AND] [OR]
  [ AND ] [ OR ]

You will notice some examples contain white spaces before, after or on both sides of the AND or OR operator, these also need to match.

I'm using the .NET framework and this is what I came up with which works. However, it seems too complicated! There has to be a way to simplify it.

(?!.*\"")(?<!(?:\bAND\b\s|\bOR\b\s))(?:\b(?:AND|OR)\b)(?=\s\b(?:AND|OR)\b)|(?<=\bAND\b\s|\bOR\b\s)(?:\b(?:AND|OR)\b)(?!\s\b(?:AND|OR)\b)|^\b(?:AND|OR)\b|(?:AND\s?|OR\s?)$|(?<=\()\s?(?:\bAND\b|\bOR\b)|(?<=\()(?:\bOR|\bAND)(?=\))|(?:\bOR|\bAND)(?=\))(?!.*\"")

Solution

  • I think this will do:

    ^ *'[^']*' *$|^ *"[^"]*" *$|(\b(AND|OR)\b) +(?1)|(?1)\s*$|^\s*(?1)
    

    Demo: https://regex101.com/r/nD9yR3/2

    Explanation:

    This regex is to match the wrong string!!!

    1. (?1) is for recursive regex. It repeats regex of group 1.
    2. ^ *'[^']*' *$|^ *"[^"]*" *$| is for ignoring string inside quotes. It's considered a match if it has value for group 1, not group zero.