I'm trying to get with a regex using PCRE2 dialect from an HTML text all the occurrences of the word 'apple'. But excluding when the word apple it's part of a link.
I'm quite a beginner with Regex, probably I'm doing quite a simple mistake.
\bapple\b
So, the following text has to match the first occurrence but not the second and third one.
Lorem ipsum apple sit amet, consectetur <a href="#">apple</a> elit <a href="/test/apple">lorem</a>.
What am I doing wrong?
In PCRE, you may use this regex:
~(?is)<a .*?</a>(*SKIP)(*F)|\bapple\b~
RegEx Details:
(?is)
: Enable ignore case and DOTALL modes<a .*?</a>
: Match text from <a
to </a>
to skip all <a>
tage(*SKIP)(*F)
: together provide a nice alternative of restriction that you cannot have a variable length lookbehind in PCRE regex|
: OR\bapple\b
: Match word apple