Search code examples
unicodeencodingcygwinutf-16iconv

why does iconv(1) in cygwin produce big-endian UTF-16 with `-t utf-16`?


On cygwin 1.7.25 with libiconv 1.14-2, iconv(1) will produce big-endian UTF-16 (with BOM) when used with iconv -t utf-16 even though x86 is little endian (and windows produces little endian UTF-16). Isn't libiconv supposed to use platform-dependent endianness for the default utf-16 conversion? It's not necessarily a problem for the apps I am using (since they can handle both, by reading BOM), but still peculiar behavior: edit a new file with notepad. It will save as utf-16le with bom, run it through iconv(1) on the same system -t utf-16 and you get a reordered file (with big-endian bom).


Solution

  • The Unicode specification indicates a preference for big endian and often non-Microsoft software will use that by default. In particular when UTF-16 is encoded without a BOM, and in the absence of a higher level protocol (such as the medium declaring a byte order, as with networks and network byte orders), the byte order is big endian. However, some software does not adhere to the specification and assumes little endian when there is no BOM, so adding a BOM may be done to allow such software to work.

    Isn't libiconv supposed to use platform-dependent endianness for the default utf-16 conversion?

    Not as far as I know. What makes you think this?