Search code examples
phpfunctionunicodeemojibin2hex

PHP emoji to unicode not converting more than one emoji appropriately


This function converts emoji to unicode

function emoji_to_unicode($emoji) {
   $emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
   $unicode = strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emoji)));
   return $unicode;
}

usage

$var = ("๐Ÿ˜€");
echo  emoji_to_unicode($var);

So it returns to me U+1F600 the problem is if I add more emoji on $var it only returns the first emoji, example of return bellow:

$var = ("๐Ÿ˜€๐Ÿ˜€");
echo  emoji_to_unicode($var);

returns to me U+1F6000001F600 when it should return U+1F600 U+1F600

It works fine when convert a single emoji but not working when convert multiple emojis


Solution

  • One way to do this is to iterate over each character in $var, converting it as you go. Note that to make the function more robust, you should only replace 3 leading zeros (so as not to mess up values that e.g. start with 4). That way the function will work with all characters. I've also added a check (using mb_ord) that the character needs conversion, so that it works with plain text too:

    function emoji_to_unicode($emoji) {
        if (mb_ord($emoji) < 256) return $emoji;
        $emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
        $unicode = strtoupper(preg_replace("/^[0]{3}/","U+",bin2hex($emoji)));
        return $unicode;
    }
    
    
    $var = ("๐Ÿ˜€x๐Ÿ˜€hello");
    $out = '';
    for ($i = 0; $i < mb_strlen($var); $i++) {
        $out .= emoji_to_unicode(mb_substr($var, $i, 1));
    }
    echo "$out\n";
    

    Output:

    U+1F600xU+1F600hello
    

    Demo on 3v4l.org