Search code examples
c++windowsqtqstring

std::wstring to QString conversion with Hiragana


I'm trying to convert text containing Hiragana from a wstring to a QString, so that it can be used on a label's text property. However my code is not working and I'm not sure why that is.

The following conversion method obviously tells me that I made something wrong:

std::wstring myWString = L"Some Hiragana: あ い う え お";
ui->label->setText(QString::fromStdWString(myWString));

Output: Some Hiragana: ゠ㄠㆠ㈠ãŠ

I can print Hiragana on a label if I put them in a string directly:

ui->label->setText("Some Hiragana: あ い う え お");

Output: Some Hiragana: あ い う え お

That means I can avoid this problem by simply using std::string instead of std::wstring, but I'd like to know why this is happening.


Solution

  • VS is interpreting the file as Windows-1252 instead of UTF-8.

    As an example, 'あ' in UTF-8 is E3 81 82, but the compiler is reading each byte as a single Windows-1252 char before converting it to the respective UTF-16 codepoints E3 201A, which works out as 'ã‚' (81 is either ignored by VS as it is reserved in Windows-1252, or not printed by qt if VS happens to convert it to the respective C1 control character).

    The direct version works because the compiler doesn't perform any conversions and leaves the string as E3 81 82.

    To fix your issue you will need to inform VS that the file is UTF-8, according to other posts one way is to ensure the file has a UTF-8 BOM.

    The only portable way of fixing this is to use escape sequences instead:

    L"Some Hiragana: \u3042 \u3044 \u3046 \u3048 \u304A"