Search code examples
phputf-8preg-split

php preg_split and UTF-8 symbols


Could anybody explain, why this code

$string='6аd_ТЕХТ GOOD_TEXT';
$words = preg_split('/\s+/', $string, NULL, PREG_SPLIT_NO_EMPTY);

var_dump($words);

displays

array(2) { [0]=> string(8) "6àd_ÒÅÕÒ" [1]=> string(9) "GOOD_TEXT" }

instead of

array(2) { [0]=> string(8) "6аd_ТЕХТ" [1]=> string(9) "GOOD_TEXT" }

I've read about this issue, but adding /u :

preg_split('/\s+/', $string, NULL, PREG_SPLIT_NO_EMPTY);// '/\s+/'

to become

preg_split('/\s+/u', $string, NULL, PREG_SPLIT_NO_EMPTY);// '/\s+/u'

doesn't help. How to fix this issue?

Thank you.


Solution

  • There is something else happening in your code that isn't present in the provided example. Tested the provided example and it works as expected. On the off-chance that this is really happening (and there is no other code affecting $string), this may be a bug with the specific PHP version you're using and can be solved by upgrading PHP (but it's highly unlikely that it's an issue with PHP).