Search code examples
c++visual-studioqtutf-8string-literals

How Utf-8 may not work in Qt 5?


I have a primitive test: the project in msvc2015 and Qt5.9.3. The file main.cpp is saved in Unicode as UTF-8 with signature:

enter image description here

I try to show the message box which should show some text on Russian language. The whole code:

#include <QtWidgets/QApplication>
#include <QMessageBox>

int main(int argc, char *argv[])
{
    QApplication a(argc, argv);

    QString ttl = QString::fromUtf8("russian_word_1");
    QString txt = QString::fromUtf8("russian_word_2");

    QMessageBox::information(nullptr, ttl, txt);

    return a.exec();
}

And what I receive:

enter image description here

How may this be possible?


Update 1: I want to use UTF-8 exactly with BOM according to the Stackoverflow author's statement:

...It does not make sense to have a string without knowing what encoding it uses ���


Update 2: In this particular case, most likely it is a bug in the compiler.


Solution

  • If the compiler produces garbage strings for UTF-8 source files that have a BOM, then it's a bug in the compiler. However, the use of a BOM with UTF-8 is not recommended in the first place. You shouldn't use it unless you actually have a reason to.

    Furthermore, you don't need to do explicit fromUtf8() conversions. You can just do:

    QString ttl = "russian_word_1";
    QString txt = "russian_word_2";
    

    QString assumes string literals are UTF-8. From the documentation:

    In all of the QString functions that take const char * parameters, the const char * is interpreted as a classic C-style '\0'-terminated string encoded in UTF-8.

    You may use QStringLiteral to wrap string literals as an optimization, but this is not required.

    Lastly, you can use tr() to wrap the string literals if you at some point want to translate the application from Russian to other languages. It is generally a good idea to use tr() in case you later decide to do a translation.

    Note that having non-English strings in source code is generally fine. It's what UTF-8 (and Unicode in general) is there for. All modern compilers support it. What most people frown upon however, is non-English code:

    auto индекс = 0; // Please don't.
    

    But non-English strings are fine.