html regex vba regex-negation regexp-replace

RegEx replace only occurrences outside of <h> html tags

I would like to regex replace Plus in the below text, but only when it's not wrapped in a header tag:

<h4 class="Somethingsomething" id="something">Plus plan</h4>The <b>Plus</b> plan starts at $14 per person per month and comes with everything from Basic.

In the above I would like to replace the second "Plus" but not the first.

My regex attempt so far is:

(?!<h\d*>)\bPlus\b(?!<\\h>)

Meaning:

Do not capture the following if in a <h + 1 digit and 0 or more characters and end an closing <\h>
Capture only if the group "Plus" is surrounded by spaces or white space

However - this captures both occurrences. Can someone point out my mistake and correct this?

I want to use this in VBA but should be a general regex question, as far as I understand.

Not relevant, as not RegEx

Solution

You can use

\bPlus\b(?![^>]*<\/h\d+>)

See the regex demo. To use the match inside the replacement pattern, use the $& backreference in your VBA code.

Details:

\bPlus\b - a whole word Plus
(?![^>]*<\/h\d+>) - a negative lookahead that fails the match if, immediately to the right of the current location, there are
- [^>]* - zero or more chars other than >
- <\/h - </h string
- \d+ - one or more digits
- > - a > char.