Search code examples
c#regexregex-group

Match paren-groups only when not preceded by tab or space


My c# regex code:

Regex regex = new Regex(@"\((.*?)\)");
return regex.Matches(str);

...nicely matches all the "paren groups" as in the data block below:

(dirty FALSE)
(composite [txtModel])
(view [star2])
(creationIndex 0)
(creationProps )
(instanceNameSpecified FALSE)
(containsObject nil)
(sName ApplicationWindow)
(txtDynamic FALSE)
(txtSubComposites )
(txtSubObjects )
(txtSubConnections )

But the following block of data throws it off the rails:

([vog317] of ZZconstant
(dirty FALSE)
(composite [gpGame])
(view [nil])
(creationIndex 1)
(creationProps composite !/gpGame sName Constraint4)
(instanceNameSpecified TRUE)
(containsObject ZZconstant)
(sName NoGo_Track_back_Co)
(description "")
(parameters "")
(languageType Prefix)
(explanation "Some sample text here!")
(salience 1)
(condition "

        (if     (eq ?hoer9_Cl:sName extens)

                then

            (or (eq ?Starry:sName sb405)
                (eq ?Starry:sName sb43)
                (eq ?Starry:sName sb455)
                (eq ?Starry:sName sb48)
            )

        )

")
)

Please note the inner-paren group:

       (if      (eq ?hoer9_Cl:sName extens)

                then

            (or (eq ?Starry:sName sb405)
                (eq ?Starry:sName sb43)
                (eq ?Starry:sName sb455)
                (eq ?Starry:sName sb48)
            )

        )

That little sub-block of paren-enclosed data should merely be seen as a part of the (condition paren-group, and not be matched by the regex pattern. The way to exclude it is for the pattern to see either of the following 2 exceptions:

  • Any ( preceded by a tab or space should be excluded from the match.
  • Any (if followed by any kind of whitespace should be excluded from the match.

So how can I modify my regex pattern \((.*?)\) so that it complies with the above 2 rules? I tried for awhile in Regex Storm, but I'm too much of a beginner with regex to work it out.


Solution

  • You could use the pattern that you tried, and add lookarounds for the logic in the 2 exceptions listed:

    (?<![ \t])\((?!if\s)(.*?)\)
    

    Explanation

    • (?<![ \t]) Negative lookbehind 1st point assert what is directly to the left is not a space or tab
    • \( Match (
    • (?!if\s) Negative lookahead 2nd point assert what is directly to the right is not if and whitespace char
    • (.*?) Capture group 1 Match any char except a newline non greedy
    • \) match )

    Regex demo

    If matching between opening and closing parenthesis can span multiple lines, you could also use a negated character class [^:

    (?<![ \t])\((?!if\s)([^()]*)\)
    

    Regex demo