In the Qt documentation it states that (among others) the following Unicode string encodings are supported:
Due to the three different codecs listed for 2 and 4 octet encoded Unicode, I was wondering: how do the two non-endian codecs ("UTF-16" and "UTF-32") decide which endianness to use?
Based on the source code in src/corelibs/codecs/
, it seems Qt uses the byte ordering of the host for UTF-16 and UTF-32.
If you use QTextCodec
to read an existing Unicode string that has a BOM, and you didn't explicitly ask to ignore the header, the byte ordering detected in the string is used.
In *qutfcodec_p.h* both QUtf16Codec::e
and QUtf32Codec::e
are initialized with the value DetectEndianness
(an enum).
In qutfcodec.cpp, near the beginning of the functions convertFromUnicode
and convertToUnicode
from the classes QUtf16
and QUtf32
(used by QUtf16Codec
and QUtf32Codec
), you can find the line:
endian = (QSysInfo::ByteOrder == QSysInfo::BigEndian)
? BigEndianness : LittleEndianness;