Search code examples
c++character-encodingutf8mb4

utf8mb4 encode/decode in c++


A third-part server echoes string to my client program, the string contains both utf8 data and unicode emoji (listed here). for example:

eg

I googled some time and found this is called utf8mb4 encoding, which is used in SQL application.

I find some article about utf8mb4 in mysql/python/ruby/etc... but no c++. Is there any c++ library can do encoding/decoding utf8mb4?


Solution

  • MySQL calls utf8mb4 what is in truth utf8:

    The character set named utf8 uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:

    so any library that supports utf8 will give you utf8mb4. In this question it was asked what solutions are there in C++ for converting to/from utf8: How to work with UTF-8 in C++, Conversion from other Encodings to UTF-8 . The three solutions given are ICU (International Components for Unicode), Boost.Locale and C++11.