Search code examples
phputf-8str-replacechinese-locale

PHP str_replace removing unintentionally removing Chinese characters


i have a PHP scripts that removes special characters, but unfortunately, some Chinese characters are also removed.

<?php

function removeSpecialCharactersFromString($inputString){
    $inputString = str_replace(str_split('#/\\:*?\"<>|[]\'_+(),{}’! &'), "", $inputString);
    return $inputString;
} 

$test = '赵景然 赵景然';
print(removeSpecialCharactersFromString($test));

?>

oddly, the output is 赵然 赵然. The character is removed

in addition, 陈 一 is also removed. What might be the possible cause?


Solution

  • The string your using to act as a list of the things you want to replace doesn't work well with the mixed encoding. What I've done is to convert this string to UTF16 and then split it.

    function removeSpecialCharactersFromString($inputString){
        $inputString = str_replace(str_split(
                mb_convert_encoding('#/\\:*?\"<>|[]\'_+(),{}’! &', 'UTF16')), "", $inputString);
        return $inputString;
    }
    $test = '#赵景然 赵景然';
    print(removeSpecialCharactersFromString($test));
    

    Which gives...

    赵景然赵景然
    

    BTW -str_replace is MB safe - sort of recognised the poster... http://php.net/manual/en/ref.mbstring.php#109937