Search code examples
c++xmlqthttp-getqnetworkreply

QNetworkReply returning incomplete XML data


I'm sending a HTTP GET request to a remote server. Using various parameters I define the content I'm interested in. In particular I make sure that output=xml is in the query since it makes the server return a reply as XML.

I have the following connections between my class HttpRetriever and the respective QNetworkReply and QNetworkAccessManager (for the QNetworkRequest see slotStartRequest()):

connect(this->reply, SIGNAL(error(QNetworkReply::NetworkError)), this,
        SLOT(slotErrorRequest(QNetworkReply::NetworkError)));
connect(this->reply, &QNetworkReply::finished, this, &HttpRetriever::slotFinishRequest);
connect(this->manager, &QNetworkAccessManager::finished, this->reply, &QNetworkReply::deleteLater);
connect(this->reply, &QIODevice::readyRead, this, &HttpRetriever::slotReadyReadRequest);

The slots that are of interest here have the following declaration:

slotFinishRequest():

void HttpRetriever::slotFinishRequest()
{
    LOG(INFO) << "Finishing HTTP GET request from URL \"" << this->url.toString() << "\"";
    this->reply = Q_NULLPTR;

    // Reset validity of reply from a previous request
    this->validReply = false;
    // Skip validation if it's disabled
    if (!this->validateReply)
    {
        LOG(WARNING) << "Validation disabled. In order to enable it see the \"validate\" and \"validationMode\" in \"settings.ini\"";
        this->validReply = true;
    }
    else
    {
        // Validate reply
        this->validReply = validateReply();
    }

    if (!this->validReply)
    {
        return;
    }

    processReply(); // Parsing

    this->processingRequest = false;
}

slotReadyReadRequest():

void HttpRetriever::slotReadyReadRequest()
{
    LOG(INFO) << "Received data from \"" << this->url.toString() << "\"";
    this->bufferReply = this->reply->readAll();
}

Inside the slotFinishRequest() I call the processReply():

void HttpRetriever::processReply()
{
    LOG(INFO) << "Processing reply for request \"" << this->url.toString() << "\"";
    LOG(DEBUG) << QString(this->bufferReply);
    // Process the XML from the reply and extract necessary data

    QXmlStreamReader reader;
    reader.addData(this->bufferReply);

    // Read the XML reply and extract required data
    // TODO
    while (!reader.atEnd())
    {
        LOG(DEBUG) << "Reading XML element";
        reader.readNextStartElement();

        QXmlStreamAttributes attributes = reader.attributes();
        foreach (QXmlStreamAttribute attrib, attributes)
        {
            LOG(DEBUG) << attrib.name();
        }
    }
    if (reader.hasError())
    {
        LOG(ERROR) << "Encountered error while parsing XML data:" << reader.errorString();
    }

    LOG(INFO) << "Sending data to data fusion handler";
    // TODO
}

I trigger the HTTP get request through the following slot:

void HttpRetriever::slotStartRequest(quint32 id)
{
    if (this->processingRequest)
    {
        this->reply->abort();
    }

    this->processingRequest = false;

    // The first request also instantiates the manager. If the slot is called after the instance of HafasHttpRetriever
    // is moved to a new thread this will ensure proper parenting
    if (!this->manager)
    {
        this->manager = new QNetworkAccessManager(this);
    }

    quint32 requestId = generateRequestId(stopId);
    if (!this->url.hasQuery())
    {
        LOG(WARNING) << "Attempting to start request without parameters";
    }

    // Part of the filters applied to the request to reduce the data received (for more see slotSetRequestParams())
    QUrlQuery query(this->url.query());
    query.addQueryItem("input", QString::number(requestId));
    // TODO Add more filters; see documentation

    this->url.setQuery(query);

    LOG(INFO) << "Requesting data from \"" << this->url.toString() << "\" with request ID:" << requestId;

    QNetworkRequest request(this->url);
    this->reply = this->manager->get(request);

    // Establish connections from/to the reply and the network access manager
    connect(this->reply, SIGNAL(error(QNetworkReply::NetworkError)), this,
            SLOT(slotErrorRequest(QNetworkReply::NetworkError)));
    connect(this->reply, &QNetworkReply::finished, this, &HttpRetriever::slotFinishRequest);
    connect(this->manager, &QNetworkAccessManager::finished, this->reply, &QNetworkReply::deleteLater);
    connect(this->reply, &QIODevice::readyRead, this, &HttpRetriever::slotReadyReadRequest);
}

As you can see so far I have laid down the foundation for the network communication between my class and the server and I am yet to start working on the parsing of the XML reply and extracting the information I need from it.

The problem is that I am getting (very, very often) either

Encountered error while parsing XML data: Start tag expected.

or

Encountered error while parsing XML data: Premature end of document

in my processReply() function. This happens every time I get a large reply (a couple of hundreds up to a couple of thousands of lines). It never happens when I get a small one (30-40 lines give or take).

So the issue is obviously somewhere in the amount of data I am receiving, the way it is put together by the QNetworkAccessManager (or whichever Qt component in all this buffers the received chunks of data) and/or perhaps the way I have setup the instances of the network-related components in my class. I also have to make an important note here namely that in my browser (latest Firefox with the HttpRequester add-on) I am always receiving the complete XML no matter how large it is. So this seems to be a problem exclusive to my application and has nothing to do with the network settings on my system.


Solution

  • Since @Marco didn't write the answer...

    The problem was that I was rewriting my buffer all the time by assigning the result from QNetworkReply::readAll(). As suggested using QByteArray::append() solves the problem.

    In order to prevent a possible issue from this solution namely that you keep appending with each and every next reply you get, QByteArray::clear() needs to be called at some point for example when the finished() signal is emitted. Of course one needs to first process its contents before flushing it down the drain.