Search code examples
phpstringmultibytemixed

check if is multibyte string in PHP


I want to check if is a string type multibyte on PHP. Have any idea how to accomplish this?

Example:

<?php!
$string = "I dont have idea that is what i am...";
if( is_multibyte( $string ) )
{
    echo 'yes!!';
}else{
    echo 'ups!';
}
?>

Maybe( rule 8 bytes ):

<?php
if( mb_strlen( $string ) > strlen() )
{
    return true;
}
else
{
    return false;
}
?>

I read: Variable width encoding - WIKI and UTF-8 - WIKI


Solution

  • There are two interpretations. The first is that every character is multibyte. The second is that the string contains one multibyte character at least. If you have an interest for handling invalid byte sequence, see https://stackoverflow.com/a/13695364/531320 for details.

    function is_all_multibyte($string)
    {
        // check if the string doesn't contain invalid byte sequence
        if (mb_check_encoding($string, 'UTF-8') === false) return false;
    
        $length = mb_strlen($string, 'UTF-8');
    
        for ($i = 0; $i < $length; $i += 1) {
    
            $char = mb_substr($string, $i, 1, 'UTF-8');
    
            // check if the string doesn't contain single character
            if (mb_check_encoding($char, 'ASCII')) {
    
                return false;
    
            }
    
        }
    
        return true;
    
    }
    
    function contains_any_multibyte($string)
    {
        return !mb_check_encoding($string, 'ASCII') && mb_check_encoding($string, 'UTF-8');
    }
    
    $data = ['東京', 'Tokyo', '東京(Tokyo)'];
    
    var_dump(
        [true, false, false] ===
        array_map(function($v) {
            return is_all_multibyte($v);
        },
        $data),
        [true, false, true] ===
        array_map(function($v) {
            return contains_any_multibyte($v);
        },
        $data)
    );