Search code examples
c++iocharacter-encodingconsole-application

Retrieving valid system encoded strings from std::wcin


Recently I faced a neccessety to read non-ASCII characters from user. I suspect this to be relatevly easy when dealing with files, however I wasn't really statisfied with it. I want to support both file input and stdin. Here is where the question kicks in.

Firstly, I am using Windows and I also know that reading from console is platform dependent, since Windows uses UTF-16 and UNIX uses UTF-8, but I think same problem may arise on UNIX if I move to it. So, here is the snippet of code that I used to capture wide console input:

#include <iostream>
#include <string>
#include <locale>

template<class T> toBytes(T obj) { ... }

int main() {
    std::setlocale(LC_ALL, "en_US");
    std::wstring ws;
    std::getline(std::wcin, ws);
    for (auto c : ws) {
        std::wcout << toBytes(c) << L' ' << L'(' << (int)c << L", \'" << c << L"\')";
    }
}

But... this do not really work out, here is the output:

Output for non-transformable char

For some characters it is able to transform them into ASCII:

Output for transformable char

So, please help 😄


Solution

  • I think I kind of found answer to the question. Actually, there is a function in winapi ("io.h" & "fcntl.h" to be exact) for changing encoding of a file descriptor. So, putting two following lines at the start of the main function will help:

    _setmode(_fileno(stdin), _O_U16TEXT);
    _setmode(_fileno(stdout), _O_U16TEXT);
    

    But there is a caveat, surrogate pairs seem to not work in the console at all, they are properly encoded but not displayed unfortunately.