c++c++11unicodeuppercasectype

How to uppercase a u32string (char32_t) with a specific locale?


On Windows with Visual Studio 2017 I can use the following code to uppercase a u32string (which is based on char32_t):

#include <locale>
#include <iostream>
#include <string>

void toUpper(std::u32string& u32str, std::string localeStr)
{
    std::locale locale(localeStr);

    for (unsigned i = 0; i<u32str.size(); ++i)
        u32str[i] = std::toupper(u32str[i], locale);
}

The same thing is not working with macOS and XCode. I'm getting such errors:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__locale:795:44: error: implicit instantiation of undefined template 'std::__1::ctype<char32_t>'
return use_facet<ctype<_CharT> >(__loc).toupper(__c);

Is there a portable way of doing this?


Solution

  • I have found a solution:

    Instead of using std::u32string I'm now using std::string with utf8 encoding. Conversion from std::u32string to std::string (utf8) can be done via utf8-cpp: http://utfcpp.sourceforge.net/

    It's needed to convert the utf8 string to std::wstring (because std::toupper is not implemented on all platforms for std::u32string).

    void toUpper(std::string& str, std::string localeStr)
    {
        //unicode to wide string converter
        std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
    
        //convert to wstring (because std::toupper is not implemented on all platforms for u32string)
        std::wstring wide = converter.from_bytes(str);
    
        std::locale locale;
    
        try
        {
            locale = std::locale(localeStr);
        }
        catch(const std::exception&)
        {
            std::cerr << "locale not supported by system: " << localeStr << " (" << getLocaleByLanguage(localeStr) << ")" << std::endl;
        }
    
        auto& f = std::use_facet<std::ctype<wchar_t>>(locale);
    
        f.toupper(&wide[0], &wide[0] + wide.size());
    
        //convert back
        str = converter.to_bytes(wide);
    }
    

    Note:

    • On Windows localeStr has to be something like this: en, de, fr, ...
    • On other Systems: localeStr must be de_DE, fr_FR, en_US, ...