qt endianness utf-16 byte-order-mark utf-32

In Qt how do QTextCodec::codecForName("UTF-16") and codecForName("UTF-32") decide the endianness to use?

In the Qt documentation it states that (among others) the following Unicode string encodings are supported:

UTF-8
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE

Due to the three different codecs listed for 2 and 4 octet encoded Unicode, I was wondering: how do the two non-endian codecs ("UTF-16" and "UTF-32") decide which endianness to use?

Solution

Based on the source code in src/corelibs/codecs/, it seems Qt uses the byte ordering of the host for UTF-16 and UTF-32.

If you use QTextCodec to read an existing Unicode string that has a BOM, and you didn't explicitly ask to ignore the header, the byte ordering detected in the string is used.

In *qutfcodec_p.h* both QUtf16Codec::e and QUtf32Codec::e are initialized with the value DetectEndianness (an enum).
In qutfcodec.cpp, near the beginning of the functions convertFromUnicode and convertToUnicode from the classes QUtf16 and QUtf32 (used by QUtf16Codec and QUtf32Codec), you can find the line:
```
endian = (QSysInfo::ByteOrder == QSysInfo::BigEndian) 
    ? BigEndianness : LittleEndianness;
```