Search code examples
c++qtutf-8streamqdebug

Qt and UTF-8: strange behaviour


To illustrate my problem I will give you an example:

I have UTF-8 encoded text file.

in.txt:

ąśćź
ąś
ŻźŹ

This program reads in.txt line by line and produces duplicate out.txt. It not only duplicates the file but also prints it to console. At the end it creates QString with the same text as the first line of file.

#include <QtCore>

int main()
{
    QVector<QString> qv;

    QFile file("in.txt");
    if (!file.open(QIODevice::ReadOnly | QIODevice::Text))
        return -1;

    QTextStream in(&file);
    in.setCodec("UTF-8");
    while (!in.atEnd())
    {
        QString line = in.readLine();
        qv.append(line);
    }

    QFile file2("out.txt");
    if (!file2.open(QIODevice::WriteOnly | QIODevice::Text))
        return -1;

    QTextStream out(&file2);
    out.setCodec("UTF-8");
    for (int i = 0; i < qv.size(); ++i)
    {
        //Debugging output
        qDebug() << qv[i];

        out << qv[i] << "\n";
    }

    // Important part!!!

    qDebug() << "Why?";
    QString s("ąśćź"); //same as the first line of file!

    qDebug() << s;
}

The console output is a mystery:

"????" 
"??" 
"???" 
Why? 
"ąśćź"

out.txt: (duplicate)

ąśćź
ąś
ŻźŹ

Why does it firstly print "????" to the console while making a duplicate and then prints "ąśćź" when I hardcode "ąśćź" into my program? What seems to be the problem? It creates identical copy of in.txt, so QString and TextStreams work fine.

Thanks in advance.


Solution

  • This is no answer to why this is happening, but doing

    for (int i = 0; i < qv.size(); ++i)
    {
        //Debugging output
        qDebug() << qv[i].toUtf8();
    
        out << qv[i] << "\n";
    }
    

    seems to fix it.