Search code examples
c++windowsgccmingwwstring

Bug in gcc wstring_convert?


I use MinGW 8.1.0 64-bit. This code snippet:

#include <clocale>
#if __has_include(<codecvt>)
#include <codecvt>
#endif
#include <cstdlib>
#include <locale>
#include <string>
#include <wchar.h>
#include <iostream>

int main() {
    auto utf8_decode = [](const std::string &str) -> std::wstring {
      std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
      return myconv.from_bytes(str);
    };

    std::string test = "=";
    auto s = utf8_decode(test);

    std::wcout << s << std::endl;

    return 0;
}

outputs a hieroglyphic (or some gibberish) on Windows, but outputs = (as expected) on Linux. Is this a bug in standard library or am I missing something?


Solution

  • Looks like this is indeed a bug in MinGW libstdc++.dll; codecvt incorrectly chooses big endian so = (0x3d) becomes (0x3d00).

    Proposed workaround - manually force little-endian by using codecvt_utf8<wchar_t, 0x10ffff, std::little_endian>