Search code examples
xmlqtqxmlstreamreader

Read XML tags inside StartElement (QXmlStreamReader)


So, I'm attempting to read through a modest-sized XML document. It's structured like so:

<project identifier="project1">
    <author>Joe Smith</author>
    <author2>Rick Jones</author2>
    <path>projects/internal/project2</path>
    <version>1.51</version>
</project>
<project identifier="project2">
     <author>Terry Chimes</author>
     <author>Janie Jones</author>
     <path>projects/external/project2</path>
     <version>19.77</version>
</project>

... and so on, for several hundred projects.

I'm using Qt5.10's QXmlStreamReader, which may have been created (or documented) by sadists.

I can find each project by using xmlReader.readNextStartElement - or by reading tag-by-tag until I find one with internal attributes (only project tags have attributes in this file).

But as soon as I read one of these parent element, the QXmlStreamReader sucks up every tag up to its closing </project> tag. The problem is that I need to get at some of that data, in this case, what's inside the <path></path> tags.

I can retrieve all the slurped-up data with xmlReader.readElementText(QXmlStreamReader::IncludeChildElements, but that's just one big data dump without the tags.

Does anyone know how I can "rewind" and read the internal tags? Or stop the stream reader from lurching forward and sucking up all the data?


Solution

  • The most likely explanation is that you are doing something wrong, QXmlStreamReader should not skip inner elements when parsing the document. You haven't provided any source code of yours so it's impossible to tell what exactly you've done wrong.

    Here's my code sample which works perfectly on the example very similar to yours with Qt 5.9.2 on macOS 10.13.2:

    #include <QCoreApplication>
    #include <QDebug>
    #include <QXmlStreamReader>
    #include <QFile>
    #include <QHash>
    
    int main(int argc, char *argv[])
    {
        QCoreApplication a(argc, argv);
    
        if (argc != 2) {
            qWarning() << "Usage: " << argv[0] << " <file>";
            return 1;
        }
    
        QFile file(argv[1]);
        if (!file.open(QIODevice::ReadOnly)) {
            qWarning() << "Failed to open file " << argv[1] << " for reading";
            return 1;
        }
    
        QXmlStreamReader reader(&file);
        QString currentProjectId;
        QHash<QString,QString> pathByProjectId;
        while(!reader.atEnd())
        {
            reader.readNext();
    
            if (reader.isStartDocument()) {
                continue;
            }
    
            if (reader.isEndDocument()) {
                break;
            }
    
            if (reader.isStartElement())
            {
                QStringRef elementName = reader.name();
                if (elementName == "project") {
                    QXmlStreamAttributes attrs = reader.attributes();
                    currentProjectId = attrs.value("identifier").toString();
                }
                else if (elementName == "path") {
                    pathByProjectId[currentProjectId] = reader.readElementText(QXmlStreamReader::IncludeChildElements);
                }
    
                continue;
            }
        }
    
        for(auto it = pathByProjectId.constBegin(),
            end = pathByProjectId.constEnd(); it != end; ++it)
        {
            qDebug() << "Path for project " << it.key() << ": " << it.value();
        }
    
        file.close();
    
        return 0;
    }
    

    Here's the slightly modified example of yours which I'm feeding to this sample program:

    <?xml version="1.0" encoding="UTF-8"?>
    <body>
    <project identifier="project1">
        <author>Joe Smith</author>
        <author2>Rick Jones</author2>
        <path>projects/internal/project1</path>
        <version>1.51</version>
    </project>
    <project identifier="project2">
         <author>Terry Chimes</author>
         <author>Janie Jones</author>
         <path>projects/external/project2</path>
         <version>19.77</version>
     </project>
     </body>
    

    What I added to your sample is XML version/encoding declaration + high-level body tag to prevent QXmlStreamReader from thinking the first project tag is the root element for the entire document. I also changed the path for the first project to make it different from the second project's one.

    And here's the output I got:

    Path for project  "project1" :  "projects/internal/project1"
    Path for project  "project2" :  "projects/external/project2"