Search code examples

Unicode characters, C++ and libcurl

I use stringstream and libcurl to download data. I have a function for parsing too.

bool parse()
    istringstream temp(buff.str());
    string line;
    QString line_QStr, lyrics_QStr;
    while (temp.good())
        getline(temp, line);
        if (QString::fromStdString(line).contains(startMarker)) break;
    if (!temp.good()) return false; // something went wrong

    while (temp.good())
        getline(temp, line);
        if ((line_QStr = QString::fromStdString(line)).contains(endMarker))
            lyrics_QStr += line_QStr.remove(endMarker); // remove the </div>
            lyrics_QStr += line_QStr;

    if (!temp.good()) return false;

    QTextDocument lyricsHtml;
    lyrics_qstr = lyricsHtml.toPlainText();
    return true;

When the text is ascii-only is ok. But if it's unicode, then I'm losing the unicode chars somewhere in this function. And it comes out something like this:

Unicode chars are messed up

I use string and getline instead of QTextStream and QString, as I couldn't find any counterpart of good() function so I couldn't make any decent error handling.

What am I doing wrong in this function that the unicode chars are lost and are displayed as 2 other chars? How can I fix it? Thanks in advance!

EDIT: I changed the parse function to this:

bool LyricsManiaDownloader::parse()
    wistringstream temp(string2wstring(buff.str()));
    wstring line;
    QString line_QStr, lyrics_QStr;
    while (temp.good())
        getline(temp, line);
        if (QString::fromStdWString(line).contains(startMarker)) break;
    if (!temp.good()) return false; // something went wrong

    while (temp.good())
        getline(temp, line);
        if ((line_QStr = QString::fromStdWString(line)).contains(endMarker))
            lyrics_QStr += line_QStr.remove(endMarker); // remove the </div>
            lyrics_QStr += line_QStr;

    if (!temp.good()) return false;

    QTextDocument lyricsHtml;
    lyrics_qstr = lyricsHtml.toPlainText();
    return true;

And the string2wstring function is

wstring string2wstring(const string &str)
    wstring wstr(str.length(), L' ');
    copy(str.begin(), str.end(), wstr.begin());
    return wstr;

And there's still some problem with encoding.

EDIT2: I use this function for saving data into a stringstream

size_t write_data_to_var(char *ptr, size_t size, size_t nmemb, void *userdata)
    ostringstream * stream = (ostringstream*) userdata;
    size_t count = size * nmemb;
    stream->write(ptr, count);
    return count;

I pass the std::ostringstream buff to curl, and the web page data is saved here. Then I use a wistringstream, convert buff.str() to wstring and use it as a source for wistringstream. The conversion from std::string to std::wstring is the decoding, isn't it?


  • The Web server returns a stream of bytes alongside a header that indicates what encoding those bytes should be understood as.