Search code examples
c#.netregexbalancing-groups

Regex with balancing groups


I need to write regex that capture generic arguments (that also can be generic) of type name in special notation like this:

System.Action[Int32,Dictionary[Int32,Int32],Int32]

lets assume type name is [\w.]+ and parameter is [\w.,\[\]]+ so I need to grab only Int32, Dictionary[Int32,Int32] and Int32

Basically I need to take something if balancing group stack is empty, but I don't really understand how.

UPD

The answer below helped me solve the problem fast (but without proper validation and with depth limitation = 1), but I've managed to do it with group balancing:

^[\w.]+                                              #Type name
\[(?<delim>)                                         #Opening bracet and first delimiter
[\w.]+                                               #Minimal content
(
[\w.]+                                                       
((?(open)|(?<param-delim>)),(?(open)|(?<delim>)))*   #Cutting param if balanced before comma and placing delimiter
((?<open>\[))*                                       #Counting [
((?<-open>\]))*                                      #Counting ]
)*
(?(open)|(?<param-delim>))\]                         #Cutting last param if balanced
(?(open)(?!)                                         #Checking balance
)$

Demo

UPD2 (Last optimization)

^[\w.]+
\[(?<delim>)
[\w.]+
(?:
 (?:(?(open)|(?<param-delim>)),(?(open)|(?<delim>))[\w.]+)?
 (?:(?<open>\[)[\w.]+)?
 (?:(?<-open>\]))*
)*
(?(open)|(?<param-delim>))\]
(?(open)(?!)
)$

Solution

  • I suggest capturing those values using

    \w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*
    

    See the regex demo.

    Details:

    • \w+(?:\.\w+)* - match 1+ word chars followed with . + 1+ word chars 1 or more times
    • \[ - a literal [
    • (?:,?(?<res>\w+(?:\[[^][]*])?))* - 0 or more sequences of:
      • ,? - an optional comma
      • (?<res>\w+(?:\[[^][]*])?) - Group "res" capturing:
        • \w+ - one or more word chars (perhaps, you would like [\w.]+)
        • (?:\[[^][]*])? - 1 or 0 (change ? to * to match 1 or more) sequences of a [, 0+ chars other than [ and ], and a closing ].

    A C# demo below:

    var line = "System.Action[Int32,Dictionary[Int32,Int32],Int32]";
    var pattern = @"\w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*";
    var result = Regex.Matches(line, pattern)
            .Cast<Match>()
            .SelectMany(x => x.Groups["res"].Captures.Cast<Capture>()
                .Select(t => t.Value))
            .ToList();
    foreach (var s in result) // DEMO
        Console.WriteLine(s);
    

    UPDATE: To account for unknown depth [...] substrings, use

    \w+(?:\.\w+)*\[(?:\s*,?\s*(?<res>\w+(?:\[(?>[^][]+|(?<o>\[)|(?<-o>]))*(?(o)(?!))])?))*
    

    See the regex demo