Search code examples
htmlregexnegative-lookbehind

Find and exclude html-tags as whole words in negative lookbehind with regex


I basically try to find all paragraphs (in javascript/jquery) in a text, that are not yet wrapped in a set of defined html-tags:

p|h1|h2|h3|h4|h5|h6|blockquote|img|table|iframe

My current regex (https://regex101.com/r/O4i2hP/1) already matches paragraphs and excludes the defined tags

(.+?(?<![</(p|h1|h2|h3|h4|h5|h6|blockquote|img|table|iframe)>]$))(\n|$)+/gm

but I just don't get, how to just match whole tags only.

The problem is:

(p|h1|h2|h3|h4|h5|h6|blockquote|img|table|iframe)> matches a single character in the list (p|h123456blockquteimgafr)> (case sensitive)

Thus, as you can see from the example, code that is wrapped in tags such as <strong>TEXT</strong> is also excluded.

I tried different things such as word boundaries \bword\b, but didn't get it working. I hope you can help. Thx


Solution

  • This will do it.

    ^(?!<(p|h1|h2|h3|h4|h5|h6|blockquote|img|table|iframe)+?>.</\1>).$