Search code examples
c++c++11winapiwine

WideCharToMultiByte doesn't work in Wine


I'm trying to use WideCharToMultiByte in order to convert std::wstring to utf8 std::string. Here is my code:

const std::wstring & utf16("lorem ipsum"); // input

if (utf16.empty()) {
    return "";
}
cout << "wstring -> string, input: , size: " << utf16.size() << endl;
for (size_t i = 0; i < utf16.size(); ++i) {
    cout << i << ": " << static_cast<int>(utf16[i]) << endl;
}
for (size_t i = 0; i < utf16.size(); ++i) {
    wcout << static_cast<wchar_t>(utf16[i]);
}
cout << endl;
std::string res;
int required_size = 0;
if ((required_size = WideCharToMultiByte(
    CP_UTF8,
    0,
    utf16.c_str(),
    utf16.size(),
    nullptr,
    0,
    nullptr,
    nullptr
)) == 0) {
    throw std::invalid_argument("Cannot convert.");
}
cout << "required size: " << required_size << endl;
res.resize(required_size);
if (WideCharToMultiByte(
    CP_UTF8,
    0,
    utf16.c_str(),
    utf16.size(),
    &res[0],
    res.size(),
    nullptr,
    nullptr
) == 0) {
    throw std::invalid_argument("Cannot convert.");
}
cout << "Result: " << res << ", size: " << res.size() << endl;
for (size_t i = 0; i < res.size(); ++i) {
    cout << i << ": " << (int)static_cast<uint8_t>(res[i]) << endl;
}
exit(1);
return res;

It runs OK, no exceptions, no error. Only the result is wrong. Here is output from running the code:

wstring -> string, input: , size: 11
0: 108
1: 111
2: 114
3: 101
4: 109
5: 32
6: 105
7: 112
8: 115
9: 117
10: 109
lorem ipsum
required size: 11
Result: lorem , size: 11
0: 108
1: 0
2: 111
3: 0
4: 114
5: 0
6: 101
7: 0
8: 109
9: 0
10: 32

I don't understand why are there the null bytes. What am I doing wrong?


Solution

  • Summarizing from comments:

    Your code is correct as far as the WideCharToMultiByte logic and arguments go; the only actual problem is the initialization of utf16, which needs to be initialized with a wide literal. The code gives the expected results with both VC++ 2015 RTM and Update 1, so this is a bug in the WideCharToMultiByte emulation layer you're using.

    That said, for C++11 onwards, there is a portable solution you should prefer when possible: std::wstring_convert in conjunction with std::codecvt_utf8_utf16

    #include <cstddef>
    #include <string>
    #include <locale>
    #include <codecvt>
    #include <iostream>
    
    std::string test(std::wstring const& utf16)
    {
        std::wcout << L"wstring -> string, input: " << utf16 << L", size: " << utf16.size() << L'\n';
        for (std::size_t i{}; i != utf16.size(); ++i)
            std::wcout << i << L": " << static_cast<int>(utf16[i]) << L'\n';
        for (std::size_t i{}; i != utf16.size(); ++i)
            std::wcout << utf16[i];
        std::wcout << L'\n';
    
        std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> cvt;
        std::string res = cvt.to_bytes(utf16);
        std::wcout << L"Result: " << res.c_str() << L", size: " << res.size() << L'\n';
        for (std::size_t i{}; i != res.size(); ++i)
            std::wcout << i << L": " << static_cast<int>(res[i]) << L'\n';
        return res;
    }
    
    int main()
    {
        test(L"lorem ipsum");
    }
    

    Online Demo