Search code examples
regexregex-negationregex-group

regex match expression except specific string (no negative lookahead)


i'm trying to write a regex that matches most cases of HTML elements, like for example:

<script></script>

I would like to make an exception for the following HTML tag specifically:

<b> 

Which I don't want to capture. Is there a way to do it without using negative lookahead/lookbehind?

At the moment i have something like this:

((\%3C)|<)[^<b]((\%2F)|\/)*[^<\/b][a-z0-9\%\=\'\(\)\ ]+((\%3E)|>)

https://regex101.com/r/ZxkVMJ/2

It does work, but beside

<b> 

it also doesn't capture all 1 character tags

(like <a> for example) 

as well as longer tags that start with b, like for example

<balloon>

Thank you for any help


Solution

  • As a disclaimer, if you have the availability of any kind of XML/HTML parser, you should really use that for your current problem. If you are forced to use regex here, then consider this pattern:

    <([^b][^>]*|b[^>]+)>.*?<\/\1>
    

    This matches an HTML tag which either starts with a letter other than b, or a tag which does start with b, but then is followed by one or more other characters (thus ruling out <b>). Here is a working demo:

    Demo