UTF-16 string literals, such as auto str = u"中国字";
, are allowed in modern C++ source code.
UTF-16 has two endiannesses: UTF-16LE and UTF-16BE. The C++ standard doesn't specify the endianness of UTF-16 string literals. So, I think it is implementation-defined.
Is there any way to specify the endianness at compile-time?
A string literal prefixed with u
is an array of const char16_t
values:
C++17 [lex.string]/10:
A string-literal that begins with
u
, such asu"asdf"
, is achar16_t
string literal. Achar16_t
string literal has type “array of n constchar16_t
”, where n is the size of the string as defined below; it is initialized with the given characters.
So the literal in the quote is equivalent to, on a Unicode system:
const char16_t x[] = { 97, 115, 100, 102, 0 };
In other words, the representation of the string literal is the same as the representation of that array.
For a more complicated string, it is still an array of const char16_t
; and there may be multiple code points per c-char, i.e. the number of elements in the array might be more than the number of characters that seem to appear in the string.
To answer the question in the title: I'm not aware of any compiler option (for any compiler) that would let you configure the endianness of char16_t
. I would expect any target system to use the same endianness for all the integral types. char16_t
is supposed to have the same properties as uint_least16_t
([basic.fundamental]/5).
If your code contains string literals and you want to write them into a file as specifically UTF16-BE for example, you'll need to do the usual endian checks/adjustments in case your system stores char16_t
in little endian form.