I'm trying to use regex to filter forbidden HTML tags out of a given string. Yes I know, I'm supposed to use a parser instead but for this specific problem it's faster this way.
The idea is to whitelist every tag which is okay (e.g. <span>, <b>, </br>
) and match forbidden ones. So far I came up with the following expression: <\/?(?!(span|b|br)).\>
It works well for single char tags like <a>
but stuff like <label>
does not work. I'd really appreciate some help, thanks in advance.
This regex will get tags while ignoring the span, br, b opening and closing tags.
It should even ignore those from the white list if they contain attributes.
<\/?(?!(?:span|br|b)(?: [^>]*)?>)[^>\/]*>