In C11, support for portable wide char types char16_t
and char32_t
are added for UTF-16 and UTF-32 respectively.
However, in the technical report, there is no mention of endianness for these two types.
For example, the following snippet in gcc-4.8.4
on my x86_64 computer when compiled with -std=c11
:
#include <stdio.h>
#include <uchar.h>
char16_t utf16_str[] = u"十六"; // U+5341 U+516D
unsigned char *chars = (unsigned char *) utf16_str;
printf("Bytes: %X %X %X %X\n", chars[0], chars[1], chars[2], chars[3]);
will produce
Bytes: 41 53 6D 51
Which means that it's little-endian.
But is this behaviour platform/implementation dependent: does it always adhere to the platform's endianness or may some implementation choose to always implement char16_t
and char32_t
in big-endian?
char16_t
and char32_t
do not guarantee Unicode encoding. (That is a C++ feature.) The macros __STDC_UTF_16__
and __STDC_UTF_32__
, respectively, indicate that Unicode code points actually determine the fixed-size character values. See C11 §6.10.8.2 for these macros.
(By the way, __STDC_ISO_10646__
indicates the same thing for wchar_t
, and it also reveals which Unicode edition is implemented via wchar_t
. Of course, in practice, the compiler simply copies code points from the source file to strings in the object file, so it doesn't need to know much about particular characters.)
Given that Unicode encoding is in effect, code point values stored in char16_t
or char32_t
must have the same object representation as uint_least16_t
and uint_least32_t
, because they are defined to be typedef
aliases to those types, respectively (C11 §7.28). This is again somewhat in contrast to C++, which makes those types distinct but explicitly requires compatible object representation.
The upshot is that yes, there is nothing special about char16_t
and char32_t
. They are ordinary integers in the platform's endianness.
However, your test program has nothing to do with endianness. It simply uses the values of the wide characters without inspecting how they map to bytes in memory.