I am slicing unicode string with diacritics using mb_substr
function but it works as I would use simple substr
function. It splits unicode characters in half displaying question marked diamond.
E.g.
echo mb_substr('ááááá', 0, 5); //Displays áá�
What might be wrong?
I have the same problem if I don't specify the encoding as the last parameter to mb_substr
: it defaults, at least on my server, to ISO-8859-1
.
But, if I set the encoding properly, to UTF-8
, it works OK :
echo mb_substr('ááááá', 0, 5, 'UTF-8');
Gets the right display in the browser :
ááááá
See mb_substr
(quoting, emphasis mine) :
string mb_substr ( string $str , int $start [,
int $length [, string $encoding ]] )
The
encoding
parameter is the character encoding. If it is omitted, the internal character encoding value will be used.