In the CPP reference documentation,
I noticed for char
The character types are large enough to represent any UTF-8 eight-bit code unit (since C++14)
and for char8_t
type for UTF-8 character representation, required to be large enough to represent any UTF-8 code unit (8 bits)
Does that mean both are the same type? Or does char8_t
have some other feature?
char8_t
is not the same as char
. It behaves exactly the same as unsigned char
though per [basic.fundamental]/9
Type
char8_t
denotes a distinct type whose underlying type isunsigned char
. Typeschar16_t
andchar32_t
denote distinct types whose underlying types areuint_least16_t
anduint_least32_t
, respectively, in<cstdint>.
emphasis mine
Do note that since the standard calls it a distinct type, code like
std::cout << std::is_same_v<unsigned char, char8_t>;
will print 0
(false), even though char8_t
is implemented as a unsigned char
. This is because it is not an alias, but a distinct type.
Another thing to note is that char
can either be implemented as a signed char
or unsigned char
. That means it is possible for char
to have the same range and representation as char8_t
, but they are still separate types. char
, signed char
, unsigned char
, and char8_t
are the same size, but they are all distinct types.