I am using preg_replace
and preg_match
with PHP, working in this charset: Cyrillic Windows 1251.
I am trying to match a word using the case-insensitive modifier.
I made these tests :
$pattern = '/myCyrillicWord1|myCyrillicWord2/i';
$subject = 'Am I able to find MYCyrILlicWord1?';
$res = preg_replace($pattern, 'matched', $subject);
On UTF-8 :
With the utf-8 modifier in the pattern :
$pattern = '/myCyrillicWord1|myCyrillicWord2/iu';
$output = 'Am I able to find matched or not';
Without :
$pattern = '/myCyrillicWord1|myCyrillicWord2/i';
$output = 'Am I able to find MYCyrILlicWord1 or not';
On Windows 1251 :
$pattern = '/myCyrillicWord1|myCyrillicWord2/i';
$output = 'Am I able to find MYCyrILlicWord1 or not';
The regex is functionnal on utf-8 but not on Windows 1251. Please notice that I had tested with cyrillics characters like 'х' and 'Х' (which look like latin letters 'x' and 'X').
My question is to know if that behavior is normal ?
How can I match my cyrillics words in Windows 1251 charset with the case-insensitive modifier ?
Many thanks.
I don't think PCRE supports charsets, so your options are basically
/[Дд][Ыы][Кк]/
to match Дык
, дыК
etc