Search code examples
phpregexpcre

Regex in PHP returning preg_match(): Compilation failed: PCRE does not support \L, \l, \N{name}, \U, or \u


I'm trying to check for user input validity in PHP using regex, but I just can't figure out what's. wrong with my regex

Here's my if statement:

if(is_numeric($_SESSION['l-teacher'])&&preg_match('/^[A-Za-z0-9\u0590-\u05ff\*\-\.\, ]+$/',$_POST['content'])&&preg_match('/^[\u0590-\u05fe ]+$/',$_POST['name'])&&is_numeric($_POST['stars'])&&$_POST['stars']>0&&$_POST['stars']<6){

\ if true }

I'm getting the following error:

Warning: preg_match(): Compilation failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 12


Solution

  • "PCRE" stands for "Perl-Compatible Regular Expressions", but that doesn't mean that all features available in Perl5 regexes are available in PCRE. The PHP manual has a page on PCRE: Differences from Perl, which includes a similar statement to the one in the error message:

    The following Perl escape sequences are not supported: \l, \u, \L, \U. In fact these are implemented by Perl's general string-handling and are not part of its pattern matching engine.

    PHP (since 7.0) does have support for \u escapes in a string, if it is double quoted, so "\u{0590}" would represent that character but might not have the desired effect inside the regex, since you need to tell the character class somehow that you want a range of Unicode code points, not a set of possible 8-bit values.

    What you actually want in this case is the PCRE notation for Unicode codepoints, which is described under Escape Sequences:

    In UTF-8 mode, "\x{...}" is allowed, where the contents of the braces is a string of hexadecimal digits. It is interpreted as a UTF-8 character whose code number is the given hexadecimal number.

    The mention of "UTF-8 mode" refers to the u pattern modifier:

    This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. An invalid subject will cause the preg_* function to match nothing; an invalid pattern will trigger an error of level E_WARNING.

    So I believe your pattern of:

    '/^[\u0590-\u05fe ]+$/'
    

    should be changed to:

    '/^[\x{0590}-\x{05fe} ]+$/u'
    

    Note that as the manual for the u modifier implies, the subject string must be encoded as UTF-8 for this to work; there is no support for UTF-16 or any other Unicode encoding.