I use MinGW 8.1.0 64-bit. This code snippet:
#include <clocale>
#if __has_include(<codecvt>)
#include <codecvt>
#endif
#include <cstdlib>
#include <locale>
#include <string>
#include <wchar.h>
#include <iostream>
int main() {
auto utf8_decode = [](const std::string &str) -> std::wstring {
std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
return myconv.from_bytes(str);
};
std::string test = "=";
auto s = utf8_decode(test);
std::wcout << s << std::endl;
return 0;
}
outputs a hieroglyphic (or some gibberish) on Windows, but outputs =
(as expected) on Linux.
Is this a bug in standard library or am I missing something?
Looks like this is indeed a bug in MinGW libstdc++.dll; codecvt incorrectly chooses big endian so =
(0x3d) becomes 㴀
(0x3d00).
Proposed workaround - manually force little-endian by using codecvt_utf8<wchar_t, 0x10ffff, std::little_endian>