Search code examples
phpregexpreg-match

RegEx failing since PHP 7.4, working in 7.3


Any ideas why this preg_match works up to PHP7.2 but fails with 7.3+ ?

$word = 'umweltfreundilch'; //real life example :/
preg_match('/^(?U)(.*(?:[aeiouyäöü])(?:[^aeiouyäöü]))(?X)(.*)$/u', $word, $matches);
var_dump($matches);

Warning: preg_match(): Compilation failed: unrecognized character after (? or (?-

PHP 7.2 and below output:

array(3) {
  [0]=>
  string(16) "umweltfreundilch"
  [1]=>
  string(2) "um"
  [2]=>
  string(14) "weltfreundilch"
}

RegEx seems to be ok, doesn't it?
https://regex101.com/r/LGdhaM/1


Solution

  • In PHP 7.3 and later, the Perl-Compatible Regular Expressions (PCRE) extension was upgraded to PCRE2.

    The PCRE2 syntax documentation does not list (?X) as an available inline modifier option. Here are the supported options:

      (?i)            caseless
      (?J)            allow duplicate named groups
      (?m)            multiline
      (?n)            no auto capture
      (?s)            single line (dotall)
      (?U)            default ungreedy (lazy)
      (?x)            extended: ignore white space except in classes
      (?xx)           as (?x) but also ignore space and tab in classes
      (?-...)         unset option(s)
      (?^)            unset imnsx options
    

    However, you may actually use X flag after the trailing delimiter:

    preg_match('/^(?U)(.*[aeiouyäöü][^aeiouyäöü])(.*)$/Xu', $word, $matches)
    

    See PHP 7.4 demo.

    To cancel (?U) effect, you may use either of the two options: a (?-U) inline modifier, like in

    preg_match('/^(?U)(.*[aeiouyäöü][^aeiouyäöü])(?-U)(.*)$/u', $word, $matches);
    //                                           ^^^^^
    

    Or, enclose the affected patterns into a (?U:...) modifier group:

    preg_match('/^(?U:(.*[aeiouyäöü][^aeiouyäöü]))(.*)$/u', $word, $matches);
    //            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^        
    

    See more about changes to regex handling in PHP 7.3+ in preg_match(): Compilation failed: invalid range in character class at offset.