Search code examples
phpcharacter-encodingsubstr

mb_substr not truncating Chinese characters properly


When I run this code:

$x = '国際交流基金 - 「松島図屏風」他 日米所蔵作品による夢の競演「宗達:創造の波」展開催';
var_dump(mb_substr($x, 0, 80));

I expect the string to not be truncated, because it is less than 80 characters.

However, this is the output:

string(80) "国際交流基金 - 「松島図屏風」他 日米所蔵作品による夢�"

Any idea why mb_substr is truncating it (and not truncating the last character properly)?


Solution

  • Set the proper encoding to be used by the function either:

    1. as the fourth parameter - e.g. mb_substr($x, 0, 80, "UTF-8")
    2. or via mb_internal_encoding() prior to calling mb_substr()
    3. or by runtime configuration.

    Example:

    $x = '国際交流基金 - 「松島図屏風」他 日米所蔵作品による夢の競演「宗達:創造の波」展開催';
    var_dump(mb_substr($x, 0, 80, "UTF-8"));
    

    Output:

    string(123) "国際交流基金 - 「松島図屏風」他 日米所蔵作品による夢の競演「宗達:創造の波」展開催"