I get a string from the database, where it is encoded with utf8_unicode_ci
. It might contain the middle dot character (⋅) and I have to find out using strcmp
. If I show the string in the HTML directly, the character is displayed without problem but when I do the comparison, the results is not what I expect.
For example:
$string = "⋅⋅⋅ This string starts with middle dots";
$result = strcmp(substr($string , 0, 2), "⋅⋅");
The results is not 0, as I think should be. The PHP file is saved with UTF-8 encoding. What am I missing here? This happens even if I take the string from a variable instead of the database
PHP's substr does not take unicode characters as a single character.
The dot you're using is actually 3 characters, 0xE2 0x8B 0x85
.
So either use mb_substr, or use a different offset:
<?php
$string = "⋅⋅⋅ This string starts with middle dots";
$result = strcmp(mb_substr($string , 0, 2), "⋅⋅");
var_dump($result);
Or if mb_* functions don't exist:
<?php
$string = "⋅⋅⋅ This string starts with middle dots";
$result = strcmp(substr($string , 0, 6), "⋅⋅");
var_dump($result);