Search code examples
c++charc++20fmtstdformat

Using std::format for formatting char8_t, char16_t and char32_t texts in C++ 20


I was working on cross platform engine code and for storing and transferring text data unsigned short was kind of best type as on windows its same as wchar_t and on mac as unichar that NSString relay on ( wchar_t on mac is for some reason 32bit in size ). With new C++ 20 and coming of char16_t I replaced my generic unsigned short type with it, and started optimizing code as much as possible, and whenever was possible to use same code for both OS ( Mac and Win ). New char16_t was working great until I come to formatting part and seeing that std::format don't supports it yet.

Tried lots of solutions, ended up using FMT library that's support formatting char16_t only to see that its code is so similar to std::format code ( same person done both ), so it got me wonder is it possible to make std::format to work on all character types.

After some work and providing missing code ( formatters templates for other character types ) only got me to famous error >> C2491: 'std::numpunct<_Elem>::id' : definition of dllimport static data member not allowed <<....

Last and final approach was to duplicate all code from std format.h file to new file ( change its namespace to be fmt so its don't conflict with std versions ) and also to copy class numpunct from xlocnum header file and got it working, my new fmt::std__format was now formatting all character types and its seems without any visible issue.

To be honest didn't expect that I will make it work so now wondering did I miss anything as if this changes was only what's needed to make std::format work for all character types then I am rly not sure why they don't finally add this support..

Additions I did were simple and were all already inside existing code, it was only to add new template versions of functions and classes for new character types, like for example function _Decode_utf in std format.h file has versions for char, wchar_t and char32_t so only new char8_t and char16_t were needed. And solution simple to reuse for char8_t function a char version ( as for some reason char function version was doing full UTF 8 encoding/decoding, probably because of back compatibility when char was used for UTF 8 ) and ofc for char16_t function a version for wchar_t worked perfectly ( at least on windows, mac is to be tested ) ....

My question is next, am I losing time here, is there some big reason why std::format don't work with new characters types that I couldn't see nor detect in code ( as lots of code inside std format.h file definitely supports new character types ) ?


Solution

  • std::format() in C++23 and earlier does not support character types other than char and wchar_t. Conceptually, extending it to support the other character types is not difficult, but there are some technical hurdles that need to be overcome. For example, std::locale facets are not specified for the charN_t types and std::format() carries a dependency on std::locale for some operations.

    SG16 has an issue tracking extending support for the charN_t types at https://github.com/sg16-unicode/sg16/issues/68.