Search code examples
c++windowsutf-8iostreamwidestring

wcout does not output as desired


I've been trying to write a C++ application for a project and I ran into this issue. Basically:

class OBSClass
{
public:
    wstring ClassName;
    uint8_t Credit;
    uint8_t Level;
    
    OBSClass() : ClassName(), Credit(), Level() {}
    OBSClass(wstring name, uint8_t credit, uint8_t hyear)
    : ClassName(name), Credit(credit), Level(hyear)
    {}
};

In some other file:

vector<OBSClass> AllClasses;
...
AllClasses.push_back(OBSClass(L"Bilişim Sistemleri Mühendisliğine Giriş", 3, 1));
AllClasses.push_back(OBSClass(L"İş Sağlığı ve Güvenliği", 3, 1));
AllClasses.push_back(OBSClass(L"Türk Dili 1", 2, 1));
... (rest omitted, some of entries have non-ASCII characters like 'ş' and 'İ')

I have a function basically outputs everything in AllClasses, the problem is wcout does not output as desired.

void PrintClasses()
{
    for (size_t i = 0; i < AllClasses.size(); i++)
    {
        wcout << "Class: " << AllClasses[i].ClassName << "\n";
    }
}

Output is 'Class: Bili' and nothing else. Program does not even tries to output other entries and just hangs. I am on windows using G++ 6.3.0. And I am not using Windows' cmd, I am using bash from mingw, so encoding will not be problem (or isn't it?). Any advice?

Edit: Also source code encoding is not a problem, just checked it is UTF8, default of VSCode

Edit: Also just checked to find out if problem is with string literals.

wstring test;
wcin >> test;
wcout << test;

Entered some non-ASCII characters like 'ö' and 'ş', it works perfectly. What is the problem with wide string literals?

Edit: Here you go

#include <iostream>
#include <string>
#include <vector>

using namespace std;

vector<wstring> testvec;

int main()
{
    testvec.push_back(L"Bilişim Sistemleri Mühendisliğine Giriş");
    testvec.push_back(L"ıiÖöUuÜü");
    testvec.push_back(L"☺☻♥♦♣♠•◘○");
    for (size_t i = 0; i < testvec.size(); i++)
        wcout << testvec[i] << "\n";
    return 0;
}

Compile with G++: g++ file.cc -O3

This code only outputs 'Bili'. It must be something with the g++ screwing up binary encoding (?), since entering values with wcin then outputting them with wcout does not generate any problem.


Solution

  • The following code works for me, using MinGW-w64 7.3.0 in both MSYS2 Bash, and Windows CMD; and with the source encoded as UTF-8:

    #include <iostream>
    #include <locale>
    #include <string>
    #include <codecvt>
    
    int main()
    {
        std::ios_base::sync_with_stdio(false);
    
        std::locale utf8( std::locale(), new std::codecvt_utf8_utf16<wchar_t> );
        std::wcout.imbue(utf8);
    
        std::wstring w(L"Bilişim Sistemleri Mühendisliğine Giriş");
        std::wcout << w << '\n';
    }
    

    Explanation:

    • The Windows console doesn't support any sort of 16-bit output; it's only ANSI and a partial UTF-8 support. So you need to configure wcout to convert the output to UTF-8. This is the default for backwards compatibility purposes, though Windows 10 1803 does add an option to set that to UTF-8 (ref).
    • imbue with a codecvt_utf8_utf16 achieves this; however you also need to disable sync_with_stdio otherwise the stream doesn't even use the facet, it just defers to stdout which has a similar problem.

    For writing to other files, I found the same technique works to write UTF-8. For writing a UTF-16 file you need to imbue the wofstream with a UTF-16 facet, see example here, and manually write a BOM.


    Commentary: Many people just avoid trying to use wide iostreams completely, due to these issues.

    You can write a UTF-8 file using a narrow stream; and have function calls in your code to convert wstring to UTF-8, if you are using wstring internally; you can of course use UTF-8 internally.

    Of course you can also write a UTF-16 file using a narrow stream, just not with operator<< from a wstring.