Search code examples
c++windowswinapilocalefacet

How to output and input UTF8 or UTF16 Unicode text in Windows using C++?


This is my program:

#include <iostream>
#include <string>
#include <locale>
#include <clocale>
#include <codecvt>
#include <io.h>
#include <fcntl.h>

int main()
{
    fflush(stdout);
    _setmode(_fileno(stdout), _O_U16TEXT);
    std::ios_base::sync_with_stdio(false);
    std::setlocale(LC_ALL, "el_GR.utf8");
    std::locale loc{ "el_GR.utf8" };
    std::locale::global(loc);       // apparently this does not set the global locale
    //std::wcout.imbue(loc);
    //std::wcin.imbue(loc);

    std::wstring yes;
    std::wcout << L"It's all good γεια ναί" << L'\n';
    std::wcin >> yes;
    std::wcout << yes << L'\n';
    return 0;
}

Lets say I want to support greek encodings (for both input and output). This program works perfectly on Linux for various output and input languages if I set the appropriate encoding and of course remove the fflush(stdout) and _setmode().

So on Windows this program will output greek (and english) correctly when I use std::locale::global(loc), but It will not take greek input that I type from the keyboard. The std::wcout << yes outputs gibberish or question marks if I type greek. Apparently ::global isn't really global on Windows?

So I tried the .imbue() method on wcout and wcin (which also works on Linux) that you see commented out here. When I use any of these two statements and run the program it will (compile properly) present me with a prompt and when I press w/e and then press 'enter' it simply exits with no errors or whatnot.

I have tried a few Windows specific commands but then I got confused too. What should I try and when on Windows is not clear to me.

So the question is how I can both input and output greek text properly in Windows like in the program above? I use MSVS 2017 latest updates. Thanks in advance.


Solution

  • As @Eryk Sun mentioned in the comments I had to use _setmode(_fileno(stdin), _O_U16TEXT);

    Windows UTF-8 console inputs is still (as of 2019) somewhat broken.

    EDIT:

    The above modification wasn't enough. I now do the following whenever I want to support UTF-8 code page and UNICODE input/output on Windows (read the code comments for more info).

    int main()
    {
        fflush( stdout );
    #if defined _MSC_VER
    #   pragma region WIN_UNICODE_SUPPORT_MAIN
    #endif
    #if defined _WIN32
        // change code page to UTF-8 UNICODE
        if ( !IsValidCodePage( CP_UTF8 ) )
        {
            return GetLastError();
        }
        if ( !SetConsoleCP( CP_UTF8 ) )
        {
            return GetLastError();
        }
        if ( !SetConsoleOutputCP( CP_UTF8 ) )
        {
            return GetLastError();
        }
        
        // change console font - post Windows Vista only
        HANDLE hStdOut = GetStdHandle( STD_OUTPUT_HANDLE );
        CONSOLE_FONT_INFOEX cfie;
        const auto sz = sizeof( CONSOLE_FONT_INFOEX );
        ZeroMemory( &cfie, sz );
        cfie.cbSize = sz;
        cfie.dwFontSize.Y = 14;
        wcscpy_s( cfie.FaceName,
            L"Lucida Console" );
        SetCurrentConsoleFontEx( hStdOut,
            false,
            &cfie );
            
        // change file stream translation mode
        _setmode( _fileno( stdout ), _O_U16TEXT );
        _setmode( _fileno( stderr ), _O_U16TEXT );
        _setmode( _fileno( stdin ), _O_U16TEXT );
    #endif
    #if defined _MSC_VER
    #   pragma endregion
    #endif
        std::ios_base::sync_with_stdio( false );
        // program:...
    
        return 0;
    }
    

    Guidelines:

    • Use "Use Windows Character Set" in Project Properties -> General -> Character Set
    • Make sure you use a terminal font that supports unicode utf-8 (Open a Console -> Properties -> Font -> "Lucida console" is ideal on Windows). The code above sets that automatically.
    • Use string and 8 bit chars.
    • Use 16 bit chars (wchar_t, wstring etc.) to interact with the Windows console
    • Use 8bit chars/string at application boundary (eg write to files, interact with other OSs etc.)
    • Convert string|char to wstring|wchar_t for interacting with the Windows APIs