After reading std::wstring VS std::string, I was under the impression that for Linux, I don't need to worry about using any wide character facilities of the language.
*things like: std::wifstream, std::wofstream, std::wstring, whar_t, etc.
This seems to go fine when I'm using only std::strings for the non-ascii characters, but not when I'm using chars to handle them.
For example: I have a file with just a unicode checkmark in it.
I can read it in, print it to the terminal, and output it to a file.
// ✓ reads in unicode to string
// ✓ outputs unicode to terminal
// ✓ outputs unicode back to the file
#include <iostream>
#include <string>
#include <fstream>
int main(){
std::ifstream in("in.txt");
std::ofstream out("out.txt");
std::string checkmark;
std::getline(in,checkmark); //size of string is actually 3 even though it just has 1 unicode character
std::cout << checkmark << std::endl;
out << checkmark;
}
The same program does not work however, if I use a char in place of the std::string:
// ✕ only partially reads in unicode to char
// ✕ does not output unicode to terminal
// ✕ does not output unicode back to the file
#include <iostream>
#include <string>
#include <fstream>
int main(){
std::ifstream in("in.txt");
std::ofstream out("out.txt");
char checkmark;
checkmark = in.get();
std::cout << checkmark << std::endl;
out << checkmark;
}
nothing appears in the terminal(apart from a newline).
The output file contains â
instead of the checkmark character.
Since a char is only one byte, I could try to use a whar_t, but it still does not work:
// ✕ only partially reads in unicode to char
// ✕ does not output unicode to terminal
// ✕ does not output unicode back to the file
#include <iostream>
#include <string>
#include <fstream>
int main(){
std::wifstream in("in.txt");
std::wofstream out("out.txt");
wchar_t checkmark;
checkmark = in.get();
std::wcout << checkmark << std::endl;
out << checkmark;
}
I've also read about setting the following locale, but it does not appear to make a difference.
setlocale(LC_ALL, "");
In the std::string case you read one line, which in our case contains a multi-byte Unicode character. In the char case you read a single byte, which is not even a single complete character.
Edit: for UTF-8 you should read into an array of char. Or just std::string since that already works.