Search code examples
htmlregexvbaregex-negationregexp-replace

RegEx replace only occurrences outside of <h> html tags


I would like to regex replace Plus in the below text, but only when it's not wrapped in a header tag:

<h4 class="Somethingsomething" id="something">Plus plan</h4>The <b>Plus</b> plan starts at $14 per person per month and comes with everything from Basic.

In the above I would like to replace the second "Plus" but not the first.

My regex attempt so far is:

(?!<h\d*>)\bPlus\b(?!<\\h>)

Meaning:

  1. Do not capture the following if in a <h + 1 digit and 0 or more characters and end an closing <\h>
  2. Capture only if the group "Plus" is surrounded by spaces or white space

However - this captures both occurrences. Can someone point out my mistake and correct this? enter image description here

I want to use this in VBA but should be a general regex question, as far as I understand.

Somewhat related but not addressing my problem in regex

Not relevant, as not RegEx


Solution

  • You can use

    \bPlus\b(?![^>]*<\/h\d+>)
    

    See the regex demo. To use the match inside the replacement pattern, use the $& backreference in your VBA code.

    Details:

    • \bPlus\b - a whole word Plus
    • (?![^>]*<\/h\d+>) - a negative lookahead that fails the match if, immediately to the right of the current location, there are
      • [^>]* - zero or more chars other than >
      • <\/h - </h string
      • \d+ - one or more digits
      • > - a > char.