Recently I faced a neccessety to read non-ASCII characters from user. I suspect this to be relatevly easy when dealing with files, however I wasn't really statisfied with it. I want to support both file input and stdin. Here is where the question kicks in.
Firstly, I am using Windows and I also know that reading from console is platform dependent, since Windows uses UTF-16 and UNIX uses UTF-8, but I think same problem may arise on UNIX if I move to it. So, here is the snippet of code that I used to capture wide console input:
#include <iostream>
#include <string>
#include <locale>
template<class T> toBytes(T obj) { ... }
int main() {
std::setlocale(LC_ALL, "en_US");
std::wstring ws;
std::getline(std::wcin, ws);
for (auto c : ws) {
std::wcout << toBytes(c) << L' ' << L'(' << (int)c << L", \'" << c << L"\')";
}
}
But... this do not really work out, here is the output:
For some characters it is able to transform them into ASCII:
So, please help 😄
I think I kind of found answer to the question. Actually, there is a function in winapi ("io.h" & "fcntl.h" to be exact) for changing encoding of a file descriptor. So, putting two following lines at the start of the main function will help:
_setmode(_fileno(stdin), _O_U16TEXT);
_setmode(_fileno(stdout), _O_U16TEXT);
But there is a caveat, surrogate pairs seem to not work in the console at all, they are properly encoded but not displayed unfortunately.