Search code examples
phpcharacter-encodingmultibytemultibyte-functions

PHP: mb_convert_encoding() to UTF-16LE doesn't work


I have code that needs to support Japanese. Internally my data is all UTF-8, but one rarely-used routine exports a text file for import into PowerPoint. For old versions of PowerPoint, the required encoding was Shift-JIS, and mb_convert_encoding($output, "SJIS") worked just fine for many years. But now I've discovered that from Office 2016 onward, the encoding needs to be UTF-16 LE (Microsoft just has to be different...sigh!). Fine, I thought, I'll just change the expression to mb_convert_encoding($output, "UTF-16LE"). But whatever PHP is doing, the resulting file is not recognized as being Unicode at all (and of course looks horrid). Notepad++ thinks it's "GB2312 (Simplified)" and even thinks the line endings are CR only, even though they are definitely CRLF. Anyone have a guess as to why it doesn't work?


Solution

  • You are most probably missing the Byte Order Mark, which is used to indicate, well, the byte order in UTF-16 strings.

    I struggled to find software that would consume UTF-16, but in the end I just saved the contents to a .txt file and opened it using macOs TextEdit/QuickLook.

    <?php
    $output = "\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e"; // 日本語
    $bom = "\xFF\xFE"; // "\xFF\xFF" would indicate BE
    $utf16 = $bom . mb_convert_encoding($output, "UTF-16LE");