Search code examples
cstringutf-8c11

How can char[] represent a UTF-8 string?


In C11, a new string literal was added with the prefix u8. This represents an array of chars with the text encoded as UTF-8. How is this even possible? Isn't a normal char signed? Meaning it has one bit less of information to use because of the sign bit? My logic would depict that a string of UTF-8 text would need to be an array of unsigned chars.


Solution

  • Isn't a normal char signed?

    It's implementation-dependent whether char is signed or unsigned.

    Further, the sign bit isn't "lost", it can still be used to represent information, and char is not necessarily 8 bits large (it might be larger on some platforms).