I have a JSON file with the following content (for example):
{
"excel_filepath": "excel_file.xlsx",
"line_length": 5.0,
"record_frequency": 2.5,
"report_file_name": "\u041f\u0421 \u041f\u0440\u043e\u043c\u0437\u043e\u043d\u0430 - \u041f\u0421 \u041f\u043e\u0433\u043e\u0440\u0435\u043b\u043e\u0432\u043e (\u0426.1)",
"line_type": 1,
}
This JSON file is generated by Python script.
For reading the JSON file, I use the <nlohmann/json.hpp>
library (I found it simple for my case):
using json = nlohmann::json;
std::ifstream f("temp_data.json");
json data = json::parse(f);
What I want to do is to read the "report_file_name"
value and create a simple .txt
file named as the value of the report_file_name
key, which is stored as Unicode, as you can see.
What I am trying to do is as follows:
_setmode(_fileno(stdout), _O_U16TEXT);
const locale utf8_locale = locale(locale(), new codecvt_utf8<wchar_t>());
string report_file_name = data["report_file_name"];
for (auto unicode_char : report_file_name)
{
wcout << typeid(unicode_char).name() << ": " << unicode_char << endl;
}
wofstream report_file(report_file_name + L".txt");
report_file.imbue(utf8_locale);
This gives an output as:
char: Ð
char:
char: Ð
char: ¡
char:
char: Ð
char:
char: Ñ
char:
char: Ð
char: ¾
... and so on
I have to note that I somehow managed to write Cyrillic letters into a report file. Interestingly, when I do:
wcout << L"\u041f\u0421" << endl;
It prints out Cyrillic letters (ПС
) correctly. Also, no problem with creating the report .txt
file with a Cyrillic name from code:
wofstream report_file(L"Отчет.txt"); // fine!
Am I doing something wrong? I'm using Windows 10, MVS 2022 with C++17 Standard. If this is helpful.
Per nlohmann::json
's documentation:
https://github.com/nlohmann/json#character-encoding
Character encoding
The library supports Unicode input as follows:
- Only UTF-8 encoded input is supported which is the default encoding for JSON according to RFC 8259.
std::u16string
andstd::u32string
can be parsed, assuming UTF-16 and UTF-32 encoding, respectively. These encodings are not supported when reading from files or other input containers.- Other encodings such as Latin-1 or ISO 8859-1 are not supported and will yield parse or serialization errors.
- Unicode noncharacters will not be replaced by the library.
- Invalid surrogates (e.g., incomplete pairs such as
\uDEAD
) will yield parse errors.- The strings stored in the library are UTF-8 encoded. When using the default string type (
std::string
), note that its length/size functions return the number of stored bytes rather than the number of characters or glyphs.- When you store strings with different encodings in the library, calling
dump()
may throw an exception unlessjson::error_handler_t::replace
orjson::error_handler_t::ignore
are used as error handlers.- To store wide strings (e.g.,
std::wstring
), you need to convert them to a UTF-8 encodedstd::string
before, see an example.
So, in your case, your report_file_name
string is a UTF-8 encoded std::string
, which you will need to decode into a std::wstring
(UTF-16 on Windows, UTF-32 on other platforms) before you can use it with std::wofstream
, eg:
std::wstring utf8_to_wstr(const std::string &uf8)
{
// there are many questions on StackOverflow about how to do this conversion.
// You can use the Win32 MultiByteToWideChar() API, or std::wstring_convert
// with std::std::codecvt_utf8/_utf16, or a 3rd party Unicode library such as
// ICU or iconv...
}
...
wstring report_file_name = utf8_to_wstr(data["report_file_name"]);