Search code examples
c++c++14language-lawyerc++20

Is C++20 'char8_t' the same as our old 'char'?


In the CPP reference documentation,

I noticed for char

The character types are large enough to represent any UTF-8 eight-bit code unit (since C++14)

and for char8_t

type for UTF-8 character representation, required to be large enough to represent any UTF-8 code unit (8 bits)

Does that mean both are the same type? Or does char8_t have some other feature?


Solution

  • char8_t is not the same as char. It behaves exactly the same as unsigned char though per [basic.fundamental]/9

    Type char8_­t denotes a distinct type whose underlying type is unsigned char. Types char16_­t and char32_­t denote distinct types whose underlying types are uint_­least16_­t and uint_­least32_­t, respectively, in <cstdint>.

    emphasis mine


    Do note that since the standard calls it a distinct type, code like

    std::cout << std::is_same_v<unsigned char, char8_t>;
    

    will print 0(false), even though char8_t is implemented as a unsigned char. This is because it is not an alias, but a distinct type.


    Another thing to note is that char can either be implemented as a signed char or unsigned char. That means it is possible for char to have the same range and representation as char8_t, but they are still separate types. char, signed char, unsigned char, and char8_t are the same size, but they are all distinct types.