Search code examples
c++utf-8c++20emojiutf

How to Store Emojis in char8_t and Print Them Out in C++20?


I just now heard about the existence of char8_t, char16_t and char32_t and I am testing it out. When I try to compile the code below, g++ throws the following error:

error: use of deleted function β€˜std::basic_ostream<char, _Traits>& std::operator<<(basic_ostream<char, _Traits>&, char32_t) [with _Traits = char_traits<char>]’
    6 |         std::cout << U'πŸ˜‹' << std::endl;
      |                      ^~~~~
#include <iostream>

int main() {
  char32_t c = U'πŸ˜‹';

  std::cout << c << std::endl;

  return 0;
}

Additionally, why can't I put the emoji into a char8_t or char16_t? For example, the following lines of code don't work:

char16_t c1 = u'πŸ˜‹';
char8_t c2 = u8'πŸ˜‹';
auto c3 = u'πŸ˜‹';
auto c4 = u8'πŸ˜‹';

From my understanding, emojis are UTF-8 characters and should therefore fit into a char8_t.


Solution

  • emojis are UTF-8 characters

    There is no such thing as a "UTF-8 character".

    There are Unicode codepoints. These can be represented in the UTF-8 encoding, such that each codepoint maps to a sequence of one or more UTF-8 code units: char8_ts. But that means that most codepoints map to multiple char8_ts: AKA, a string. And Emojis are not among the 127 codepoints that map to a single UTF-8 code unit.

    Emoji in particular can be built out of multiple codepoints, so even using UTF-32, you cannot guarantee that any emoji could be stored in a single char32_t codepoint.

    It's best to treat these things as strings, not characters, at all times. Forget that "characters" even exist.