Search code examples
c++icu

Convert ICU Unicode string to std::wstring (or wchar_t*)


Is there an icu function to create a std::wstring from an icu UnicodeString ? I have been searching the ICU manual but haven't been able to find one.

(I know i can convert UnicodeString to UTF8 and then convert to platform dependent wchar_t* but i am looking for one function in UnicodeString which can do this conversion.


Solution

  • The C++ standard doesn't dictate any specific encoding for std::wstring. On Windows systems, wchar_t is 16-bit, and on Linux, macOS, and several other platforms, wchar_t is 32-bit. As far as C++'s std::wstring is concerned, it is just an arbitrary sequence of wchar_t in much the same way that std::string is just an arbitrary sequence of char.

    It seems that icu::UnicodeString has no in-built way of creating a std::wstring, but if you really want to create a std::wstring anyway, you can use the C-based API u_strToWCS() like this:

    icu::UnicodeString ustr = /* get from somewhere */;
    std::wstring wstr;
    
    int32_t requiredSize;
    UErrorCode error = U_ZERO_ERROR;
    
    // obtain the size of string we need
    u_strToWCS(nullptr, 0, &requiredSize, ustr.getBuffer(), ustr.length(), &error);
    
    // resize accordingly (this will not include any terminating null character, but it also doesn't need to either)
    wstr.resize(requiredSize);
    
    // copy the UnicodeString buffer to the std::wstring.
    u_strToWCS(wstr.data(), wstr.size(), nullptr, ustr.getBuffer(), ustr.length(), &error);
    

    Supposedly, u_strToWCS() will use the most efficient method for converting from UChar to wchar_t (if they are the same size, then it is just a straightfoward copy I suppose).