Search code examples
c#regexlookbehind

Regex: Matching all words EXCEPT those inside of parenthesis (C#)


So given:

COLUMN_1, COLUMN_2, COLUMN_3, ((COLUMN_1) AS SOME TEXT) AS COLUMN_4, COLUMN_5

How would I go about getting my matches as:

COLUMN_1
COLUMN_2
COLUMN_3
COLUMN_4
COLUMN_5

I've tried:

(?<!(\(.*?\)))(\w+)(,\s*\w+)*?

But I feel like I'm way off base :( I'm using regexstorm.net for testing.

Appreciate any help :)


Solution

  • You need a regex that keeps track of opening and closing parentheses and makes sure that a word is only matched if a balanced set of parentheses (or no parentheses at all) follow:

    Regex regexObj = new Regex(
        @"\w+                  # Match a word
        (?=                    # only if it's possible to match the following:
            (?>                # Atomic group (used to avoid catastrophic backtracking):
               [^()]+          # Match any characters except parens
            |                  # or
               \(  (?<DEPTH>)  # a (, increasing the depth counter
            |                  # or
               \)  (?<-DEPTH>) # a ), decreasing the depth counter
            )*                 # any number of times.
            (?(DEPTH)(?!))     # Then make sure the depth counter is zero again
            $                  # at the end of the string.
        )                      # (End of lookahead assertion)", 
        RegexOptions.IgnorePatternWhitespace);
    

    I tried to provide a test link to regexstorm.net, but it was too long for StackOverflow. Apparently, SO also doesn't like URL shorteners, so I can't link this directly, but you should be able to recreate the link easily: http://bit[dot]ly/2cNZS0O