Search code examples
c++stringunicodetype-conversion

how does one convert std::u16string -> std::wstring using <codecvt>?


I found a bunch of questions on a similar topic, but nothing regarding wide to wide conversion with <codecvt>, which is supposed to be the correct choice in the modern code.

The std::codecvt_utf16<wchar_t> seems to be a logical choice to perform the conversion.

However std::wstring_convert seem to expect std::string at one end. The methods from_bytes and to_bytes emphasize this purpose.

I mean, the best solution so far is something like std::copy, which might work for my specific case, but seems kinda low tech and probably not too correct either.

I have a string feeling that I am missing something rather obvious.

Cheers.


Solution

  • The std::wstring_convert and std::codecvt... classes are deprecated in C++17 onward. There is no longer a standard way to convert between the various string classes.

    If your compiler still supports the classes, you can certainly use them. However, you cannot convert directly from std::u16string to std::wstring (and vice versa) with them. You will have to convert to an intermediate UTF-8 std::string first, and then convert that afterwards, eg:

    std::u16string utf16 = ...;
    
    std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> utf16conv;
    std::string utf8 = utf16conv.to_bytes(utf16);
    
    std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> wconv;
    std::wstring wstr = wconv.from_bytes(utf8);
    

    Just know that this approach will break when the classes are eventually dropped from the standard library.

    Using std::copy() (or simply the various std::wstring data construct/assign methods) will work only on Windows, where wchar_t and char16_t are both 16-bit in size representing UTF-16:

    std::u16string utf16 = ...;
    std::wstring wstr;
    
    #ifdef _WIN32
    wstr.reserve(utf16.size());
    std::copy(utf16.begin(), utf16.end(), std::back_inserter(wstr));
    /*
    or: wstr = std::wstring(utf16.begin(), utf16.end());
    or: wstr.assign(utf16.begin(), utf16.end());
    or: wstr = std::wstring(reinterpret_cast<const wchar_t*>(utf16.c_str()), utf16.size());
    or: wstr.assign(reinterpret_cast<const wchar_t*>(utf16.c_str()), utf16.size());
    */
    #else
    // do something else ...
    #endif
    

    But, on other platforms, where wchar_t is 32-bit in size representing UTF-32, you will need to actually convert the data, using the code shown above, or a platform-specific API or 3rd party Unicode library that can do the data conversion, such as libiconv, ICU. etc.