Search code examples

What is the balance matching regular expression for removing nested brackets composed of sets of ordered characters?

Following this question:

I am trying to use balanced matching to replace all items within brackets but in the example the brackets are "{{" and "}}". Whereas my brackets would be "<![CDATA[" and "]]>".

I am having trouble modifying the [^{}] section of the regular expression in the accepted answer to the previous question to use my version of brackets instead. I have tried to modify [^{}] to (?!(<!\[CDATA\|\]\]>)).

I have simplified the problem to use 12 as the open bracket and 34 as the close bracket. The following returns "STST" as expected.

using System.Text.RegularExpressions;


However it does not work if i replace 12 with <!\[CDATA\[" and 34 with "\]\]>.

Finally, I would like to operate on the following CDATA Sample String:


should return



  • Your current 12...34 matching regex is not right since the tempered greedy token used is "corrupt" ((?!(12|34))* is missing the consuming part, .).

    You just need to remember about the parts of the regex like that: 1) the leading delimiter pattern, 2) the trailing delimiter pattern, 3) the part in between should match what is not both 1 and 2, 4) the conditional construct that checks if the "technical" group capture stack is empty.

    So, the numeric regex can be fixed as


    (regex demo) and the CDATA one will look like


    See this regex demo

    NOTE: If there can be newline symbols in the string input, use RegexOptions.Singleline option or the inline modifier version, (?s), at the pattern start.

    Pattern details:

    • 12 - the leading delimiter pattern
    • (?> - start of the atomic group that will match what is neither leading nor trailing patterns, and will keep track of those delimiting substrings:
      • (?!12|34).| - match any char (if RegexOptions.Singleline option is used, even including a newline) but a char that is a starting point of the 12 or 34 sequences
      • (?<o>)12| - match12` and increment the "o" group capture stack, or
      • (?<-o>)34 - match 34 and decrement the "o" group capture stack
    • )* - and repeat that (keep matching) zero or more occurrences of the patterns inside the atomic group
    • (?(o)(?!)) - the conditional construct that will check if the "o" group capture stack is empty. If it is not empty, backtracking will trigger, and balanced number of leading/trailing delimiters will be searched for.
    • 34 - the trailing delimiter pattern.

    Also, [ in <![CDATA[ must be escaped, as [ is a special char outside the character class, and ] in ]]> do not have to be escaped, since outside a character class, ] is not special for a .NET regex.