I'm trying to check for user input validity in PHP using regex, but I just can't figure out what's. wrong with my regex
Here's my if
statement:
if(is_numeric($_SESSION['l-teacher'])&&preg_match('/^[A-Za-z0-9\u0590-\u05ff\*\-\.\, ]+$/',$_POST['content'])&&preg_match('/^[\u0590-\u05fe ]+$/',$_POST['name'])&&is_numeric($_POST['stars'])&&$_POST['stars']>0&&$_POST['stars']<6){
\ if true }
I'm getting the following error:
Warning: preg_match(): Compilation failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 12
"PCRE" stands for "Perl-Compatible Regular Expressions", but that doesn't mean that all features available in Perl5 regexes are available in PCRE. The PHP manual has a page on PCRE: Differences from Perl, which includes a similar statement to the one in the error message:
The following Perl escape sequences are not supported: \l, \u, \L, \U. In fact these are implemented by Perl's general string-handling and are not part of its pattern matching engine.
PHP (since 7.0) does have support for \u
escapes in a string, if it is double quoted, so "\u{0590}"
would represent that character but might not have the desired effect inside the regex, since you need to tell the character class somehow that you want a range of Unicode code points, not a set of possible 8-bit values.
What you actually want in this case is the PCRE notation for Unicode codepoints, which is described under Escape Sequences:
In UTF-8 mode, "\x{...}" is allowed, where the contents of the braces is a string of hexadecimal digits. It is interpreted as a UTF-8 character whose code number is the given hexadecimal number.
The mention of "UTF-8 mode" refers to the u
pattern modifier:
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. An invalid subject will cause the preg_* function to match nothing; an invalid pattern will trigger an error of level E_WARNING.
So I believe your pattern of:
'/^[\u0590-\u05fe ]+$/'
should be changed to:
'/^[\x{0590}-\x{05fe} ]+$/u'
Note that as the manual for the u
modifier implies, the subject string must be encoded as UTF-8 for this to work; there is no support for UTF-16 or any other Unicode encoding.