I have a RegEx that does not match using PHP (nor regex101.com) but does with MariaDB. Its purpose is to search for HTML classes in XML values (the HTML is encoded).
Here is an example XML value where you can see a <ul>
element with a liste--non-ordonnee--gros--exergue
CSS class:
<ul class="liste--non-ordonnee--gros--exergue">
I want the RegEx to match only full classes. Therefore if I search --exergue
I don't want it to match.
Using PHP or other PCRE/PCRE2 online tester it does not match:
~(class="(?:[^&]*\s)?)--exergue~sU
But using MariaDB (v10.2.40 - PCRE 8.42), it matches:
(?sU)(class="(?:[^&]*\s)?)--exergue
It looks for a class
attribute containing the class to replace. I tried to change the class name to something else for demonstration purposes (searching --suffix
in class-with--suffix
) but then it would not match anymore on the MariaDB version.
What is wrong with my RegEx or its MariaDB version ?
I am aware that regular expressions should not be used with HTML and am open to alternatives but this is TYPO3: storing encoded HTML into XML values in a db column. Design changes require massive class renaming.
First, the very short MCVE of your case:
SELECT 'class="s--e' REGEXP '(?sU)(class="(?:[^&]*\s)?)--e'
MariaDB matches, PHP doesn't match (demo). Why? As in MariaDB's manual for REGEXP:
Note: Because MariaDB uses the C escape syntax in strings (for example, "\n" to represent the newline character), you must double any "
\
" that you use in your REGEXP strings.
Your issue is caused by the \s
that should be double-escaped as \\s
in your MariaDB query. Once you fix that, the PHP and SQL regex statements become equivalent and behave the same.
This won't match (with --e
not led by space):
SELECT 'class="s--e' REGEXP '(?sU)(class="(?:[^&]*\\s)?)--e';
This will match (with --e
led by space):
SELECT 'class="s --e' REGEXP '(?sU)(class="(?:[^&]*\\s)?)--e';