I just now heard about the existence of char8_t
, char16_t
and char32_t
and I am testing it out. When I try to compile the code below, g++
throws the following error:
error: use of deleted function βstd::basic_ostream<char, _Traits>& std::operator<<(basic_ostream<char, _Traits>&, char32_t) [with _Traits = char_traits<char>]β
6 | std::cout << U'π' << std::endl;
| ^~~~~
#include <iostream>
int main() {
char32_t c = U'π';
std::cout << c << std::endl;
return 0;
}
Additionally, why can't I put the emoji into a char8_t
or char16_t
? For example, the following lines of code don't work:
char16_t c1 = u'π';
char8_t c2 = u8'π';
auto c3 = u'π';
auto c4 = u8'π';
From my understanding, emojis are UTF-8 characters and should therefore fit into a char8_t
.
emojis are UTF-8 characters
There is no such thing as a "UTF-8 character".
There are Unicode codepoints. These can be represented in the UTF-8 encoding, such that each codepoint maps to a sequence of one or more UTF-8 code units: char8_t
s. But that means that most codepoints map to multiple char8_t
s: AKA, a string. And Emojis are not among the 127 codepoints that map to a single UTF-8 code unit.
Emoji in particular can be built out of multiple codepoints, so even using UTF-32, you cannot guarantee that any emoji could be stored in a single char32_t
codepoint.
It's best to treat these things as strings, not characters, at all times. Forget that "characters" even exist.