Search code examples
phpencodingdeprecateddecoding

PHP utf8_en/decode deprecated, what can i use?


90% of my website pages use the utf8 encoding feature for compile an DataTable.

$a[] = array_map('utf8_encode', $item);

With the old version 8.0 of php everything was fine, in the new version it gives me an error when a value of $item ($item is an array) is null.

What is a valid alternative?


Solution

  • Before converting any code, make sure the function is actually doing what you want. It is not a magic "fix all my UTF-8 problems" function.

    In particular, see the note on the PHP manual page for utf8_encode:

    This function does not attempt to guess the current encoding of the provided string, it assumes it is encoded as ISO-8859-1 (also known as "Latin 1") and converts to UTF-8. Since every sequence of bytes is a valid ISO-8859-1 string, this never results in an error, but will not result in a useful string if a different encoding was intended.

    Many web pages marked as using the ISO-8859-1 character encoding actually use the similar Windows-1252 encoding, and web browsers will interpret ISO-8859-1 web pages as Windows-1252. Windows-1252 features additional printable characters, such as the Euro sign (€) and curly quotes (“ ”), instead of certain ISO-8859-1 control characters. This function will not convert such Windows-1252 characters correctly. Use a different function if Windows-1252 conversion is required.

    If you are sure it was the right function, the manual page suggests several alternatives, principally mb_convert_encoding.

    The following all give the same result:

    $iso8859_1_string = "\x5A\x6F\xEB"; // 'Zoë' in ISO-8859-1
    
    $utf8_string = utf8_encode($iso8859_1_string);
    echo bin2hex($utf8_string), "\n";
    
    $utf8_string = mb_convert_encoding($iso8859_1_string, 'UTF-8', 'ISO-8859-1');
    echo bin2hex($utf8_string), "\n";
    
    $utf8_string = UConverter::transcode($iso8859_1_string, 'UTF8', 'ISO-8859-1');
    echo bin2hex($utf8_string), "\n";
    
    $utf8_string = iconv('ISO-8859-1', 'UTF-8', $iso8859_1_string);
    echo bin2hex($utf8_string), "\n";
    

    Each of these also lets you specify what encoding you are converting from, which is information you need to work out.

    While there are functions that attempt to guess that encoding, it is impossible for a computer to know how text was intended to be interpreted. Imagine you have a price of "89.99", and you want the equivalent price in Euros. You can guess that it's not in Japanese Yen or Bitcoin, because the numbers are unlikely; but it could easily be in US Dollars, British Pounds, Indian Rupees, etc. Text encoding is similar: many sequences of bytes are just as valid and likely in a whole range of different encodings.