Search code examples
phpcharacter-encodingphp-stream-wrappers

How To Read A File in php wrappers as utf-16


Is there a way to read a file in a specific character encoding like UTF-16 using PHP's stream wrappers, in the same way I can read a base64-encoded file using php://filter/convert.base64-decode/resource=file.txt?


Solution

  • PHP strings don't know anything about encodings, so PHP file functions essentially treat every file as a binary file.

    If you know that a set of bytes should be read as UTF-16, you can convert it to some other encoding of your choice (here using UTF-8 as an example) using any of these (depending which extensions you have installed):

    // Requires ext/iconv; arguments are From, To, String
    $utf8_string = iconv('UTF-16', 'UTF-8', $utf16_string);
    // Requires ext/mbstring; arguments are String, To, From
    $utf8_string = mb_convert_encoding($utf16_string, 'UTF-8', 'UTF-16');
    // Requires ext/intl; arguments are String, To, From
    $utf8_string = UConverter::transcode($utf16_string, 'UTF-8', 'UTF-16');
    

    Conversely, if you know that the string is in some particular encoding (again, using UTF-8 as an example), and want it to be UTF-16, you would put things in the opposite order:

    // Requires ext/iconv; arguments are From, To, String
    $utf16_string = iconv('UTF-8', 'UTF-16', $utf8_string);
    // Requires ext/mbstring; arguments are String, To, From
    $utf16_string = mb_convert_encoding($utf8_string, 'UTF-16', 'UTF-8');
    // Requires ext/intl; arguments are String, To, From
    $utf16_string = UConverter::transcode($utf8_string, 'UTF-16', 'UTF-8');
    

    In both cases, the resulting string is just a different sequence of bytes; other PHP functions still won't "know" what it "means".


    The "iconv" extension also provides a conversion filter which runs the equivalent of the iconv function as a file or stream is being read. So if you have a file which you know should be read as UTF-16, and want its contents as UTF-8, you could write:

    $fp = fopen('php://filter/convert.iconv.utf-16.utf-8/resource=/path/to/utf16-file.txt', 'r');
    $first_10_bytes_of_utf16_converted_to_utf8 = fgets($fp, 10);
    fclose($fp);
    

    Or the reverse - a UTF-8 file which you want to read as UTF-16:

    $fp = fopen('php://filter/convert.iconv.utf-8.utf-16/resource=/path/to/utf8-file.txt', 'r');
    $first_10_bytes_of_utf8_converted_to_utf16 = fgets($fp, 10);
    fclose($fp);
    

    Again, it's important to remember that PHP is working in bytes, so the fgets calls above may result in corrupted text because the 10th byte wasn't the end of a Unicode code point.