I can get the string of my interest using regex, but how do I replace it with a character substituted in the capture?
I want to remove the >
character from inside any html attribute, or replace it with >
.
Sample original string
<html>
<head></head>
<body>
<div sometag="abc>def" onclick="myfn()" class='xyz'>
Dear {@CustomerName},
blah blah blah
</div></body>
</html>
Desired result
<html>
<head></head>
<body>
<div sometag="abc>def" onclick="myfn()" class='xyz'>
Dear {@CustomerName},
blah blah blah
</div></body>
</html>
I'm using the following regex pattern and replacement
Pattern: \s\w+\s*=\s*(['"])[^\1]+?\1
Replacement: -- don't know! what should I use? --
This is my vb.net
code (just in case if it helps)
Dim reAttr As New Regex("\s\w+\s*=\s*(['""])[^\1]+?\1", RegexOptions.Singleline)
result = reAttr.Replace(text, Replace("$&", ">", ""))
You can use
Dim reAttr As New Regex("\s\w+\s*=\s*(['""])(?:(?!\1).)*?\1", RegexOptions.Singleline)
Dim result = reAttr.Replace(text, New MatchEvaluator(Function(m As Match)
Return m.Value.Replace(">", "-")
End Function))
Note that [^\1]
is not doing what you expect, it matches any char but a SOH char (\x01
). The (?:(?!\1).)*?
tempered greedy token does what you wanted, it matches any char, other than the value captured in Group 1, 0 or more times, as few times as possible.
The MatchEvaluator
is used as the replacement arguments where you may access the whole match value with m.Value
.