Search code examples
c#regexalgorithmnsregularexpressionqregularexpression

Extracting all overlapping substrings between nested matching parentheses with a .NET regex


I'm trying to parse mathematical expressions with nested brackets:

(1 * (2 - 3)) + 4

I want to get every expression in brackets, like this:

  • (1 * (2 - 3))
  • (2 - 3)

Using this expression: (.*?\))(?=($|[^(]+)) I'm getting this result:

(1 * (2 - 3)

)

And using this expression: \(.*?\) I'm getting this result:

(1 * (2 - 3) 

But nothing works correctly. How can I loop an expression inside?


Solution

  • You can use

    (?=(\((?>[^()]+|(?<c>)\(|(?<-c>)\))*(?(c)(?!))\)))
    

    See the regex demo. Details:

    • (?= - a positive lookahead:
      • (\((?>[^()]+|(?<c>)\(|(?<-c>)\))*(?(c)(?!))\))) - Group 1:
        • \( - a ( char
        • (?>[^()]+|(?<c>)\(|(?<-c>)\))* - zero or more repetitions of any one or more chars other than ( and ), or a ( char (with a value pushed onto Group "c" stack), or a ) char (with a value popped from the Group "c" stack)
        • (?(c)(?!)) - if Group "c" stack is not empty, fail and backtrack
        • \) - a ) char.

    See the C# demo:

    var text = "(1 * (2 - 3)) + 4";
    var pattern = @"(?=(\((?>[^()]+|(?<c>)\(|(?<-c>)\))*(?(c)(?!))\)))";
    var results = Regex.Matches(text, pattern)
        .Cast<Match>()
        .Select(m => m.Groups[1].Value)
        .ToList();
    Console.WriteLine(String.Join(", ", results));
    // => (1 * (2 - 3)), (2 - 3)