Search code examples
phpunicodesubstrmultibyte

Problem with diacritics and mb_substr


I am slicing unicode string with diacritics using mb_substr function but it works as I would use simple substr function. It splits unicode characters in half displaying question marked diamond.

E.g.

echo mb_substr('ááááá', 0, 5); //Displays áá�

What might be wrong?


Solution

  • I have the same problem if I don't specify the encoding as the last parameter to mb_substr : it defaults, at least on my server, to ISO-8859-1.


    But, if I set the encoding properly, to UTF-8, it works OK :

    echo mb_substr('ááááá', 0, 5, 'UTF-8');
    

    Gets the right display in the browser :

    ááááá
    


    See mb_substr (quoting, emphasis mine) :

    string mb_substr  ( string $str  , int $start  [, 
        int $length  [, string $encoding  ]] )
    

    The encoding parameter is the character encoding. If it is omitted, the internal character encoding value will be used.