Following this question:
https://stackoverflow.com/a/24591578/1329812
I am trying to use balanced matching to replace all items within brackets but in the example the brackets are "{{"
and "}}"
. Whereas my brackets would be "<![CDATA["
and "]]>"
.
I am having trouble modifying the [^{}]
section of the regular expression in the accepted answer to the previous question to use my version of brackets instead. I have tried to modify [^{}]
to (?!(<!\[CDATA\|\]\]>))
.
I have simplified the problem to use 12 as the open bracket and 34 as the close bracket. The following returns "STST" as expected.
using System.Text.RegularExpressions;
Regex.Replace(
12T1212E343434STST12RING34',--input
'12(?!(12|34))*(((?<Open>12)(?!(12|34))*)+((?<Close-Open>34)(?!(12|34))*)+)*(?(Open)(?!))34',--pattern
''--replacement
);
However it does not work if i replace 12
with <!\[CDATA\[" and 34 with "\]\]>
.
Finally, I would like to operate on the following CDATA Sample String:
"<![CDATA[t<![CDATA[e]]>]]>stst<![CDATA[ring]]>"
should return
"stst"
Your current 12...34
matching regex is not right since the tempered greedy token used is "corrupt" ((?!(12|34))*
is missing the consuming part, .
).
You just need to remember about the parts of the regex like that: 1) the leading delimiter pattern, 2) the trailing delimiter pattern, 3) the part in between should match what is not both 1 and 2, 4) the conditional construct that checks if the "technical" group capture stack is empty.
So, the numeric regex can be fixed as
12(?>(?!12|34).|(?<o>)12|(?<-o>)34)*(?(o)(?!))34
(regex demo) and the CDATA one will look like
<!\[CDATA\[(?>(?!<!\[CDATA\[|]]>).|(?<o>)<!\[CDATA\[|(?<-o>)]]>)*(?(o)(?!))]]>
See this regex demo
NOTE: If there can be newline symbols in the string input, use RegexOptions.Singleline
option or the inline modifier version, (?s)
, at the pattern start.
Pattern details:
12
- the leading delimiter pattern(?>
- start of the atomic group that will match what is neither leading nor trailing patterns, and will keep track of those delimiting substrings:
(?!12|34).|
- match any char (if RegexOptions.Singleline
option is used, even including a newline) but a char that is a starting point of the 12
or 34
sequences(?<o>)12| - match
12` and increment the "o" group capture stack, or(?<-o>)34
- match 34
and decrement the "o" group capture stack)*
- and repeat that (keep matching) zero or more occurrences of the patterns inside the atomic group(?(o)(?!))
- the conditional construct that will check if the "o" group capture stack is empty. If it is not empty, backtracking will trigger, and balanced number of leading/trailing delimiters will be searched for.34
- the trailing delimiter pattern.Also, [
in <![CDATA[
must be escaped, as [
is a special char outside the character class, and ]
in ]]>
do not have to be escaped, since outside a character class, ]
is not special for a .NET regex.