Search code examples
phptrimcjk

Why does trim function not working correctly for Japanese input?


I want to trim full byte spaces from before and after strings. It can contain both Japanese and/or English letters. However, it is not working perfectly for strings starting with hiragana and katakana.

//test1
$text = "  romaji  ";
var_dump(trim($text," ")); // returns "romaji"

//test2 
$text = "  ひらがな  ";
var_dump(trim($text," ")); // returns "��らがな"

//test3
$text = "  カタカナ  ";
var_dump(trim($text," ")); // returns "��タカナ"

//test4 
$text = "  漢字  ";
var_dump(trim($text," ")); // returns "漢字"

Why is the code not working for test 2 and 3? How can we solve it?


Solution

  • This is hard to troubleshoot, more detailed described here

    1. PHP output showing little black diamonds with a question mark

    2. Trouble with UTF-8 characters; what I see is not what I stored

    For overcome this you can use str_replace. replace all spaces with nothing in string. This will remove all spaces. Not recommended in sentences as it remove all spaces. Good for words.

    $text = "  ひらがな  ";
    $new_str = str_replace(' ', '', $text);
    echo $new_str;    // returns ひらがな
    

    If you want to remove spaces in beginning and ending use regex with preg_replace

    print preg_replace( '/^s+|s+$/', '', "    ひらがな ひらがな" ); //return ひらがな ひらがな
    

    trim is actually nine times faster. But you can use it.
    Check speed comparison here.

    https://stackoverflow.com/a/4787238/10915534