Search code examples
c++tinyxml2

TinyXML2 get text from node and all subnodes


How does one go about getting the text from the nodes and subnodes in TinyXML2?

The XMLPrinter class seems to do what I need, but it does not print the text properly.

My XML:

<div>The quick brown <b>fox</b> jumps over the <i>lazy</i> dog.</div>

My class which extends the XMLPrinter class:

class XMLTextPrinter : public XMLPrinter {
    virtual bool    VisitEnter (const XMLDocument &) { return true; }
    virtual bool    VisitExit (const XMLDocument &)  { return true; }
    virtual bool    VisitEnter (const XMLElement &e, const XMLAttribute *)  {
        auto text = e.GetText();
        if(text) {
            std::cout << text;
        }
        return true;
    }
    virtual bool    VisitExit (const XMLElement &e)  { return true; }
    virtual bool    Visit (const XMLDeclaration &)  { return true; }
    virtual bool    Visit (const XMLText &e) { return true; }
    virtual bool    Visit (const XMLComment &)  { return true; }
    virtual bool    Visit (const XMLUnknown &)  { return true; }
};

My code:

XMLDocument document;
document.Parse(..., ...);

auto elem = ...;

XMLTextPrinter printer;
elem->Accept(&printer);

The output:

The quick brown foxlazy

Why is it ignoring all text which come after the <b> and <i> elements? How can I solve this? Also, the XMLPrinter class properly prints it out with the tags, but I do not want the tags.


Solution

  • [Edited 14-Apr-17 to improve (I hope).]

    XMLPrinter derives from XMLVisitor and prints the XML document (or element) in full, tags, attributes and all. XMLVisitor does the work of recursing up and down the XML hierarchy, calling default, do nothing, implementations of methods VisitEnter/VisitExit for nodes that can have descendants (children), i.e. documents and elements and ``Visit` for leaf nodes, i.e. text, comments etc. Override these methods in a derived class to implement the desired functionality.

    The first problem is that you are modifying XMLPrinter. This derives from XMLVisitor and creates a printable representation of the XML document. But then you replace all XMLPrinter's visit... methods with your own. It would be much better, and less work, to derive from XMLVisitor directly.

    Secondly, you're getting the element text from VisitEnter alone using GetText() which will not work when child nodes are embedded in it as documented here.

    In this case, to get only the text of all elements override Visit for the text leaf nodes, i.e. Visit(const XMLText &).

    #include "tinyxml2.h"
    #include <iostream>
    
    using namespace tinyxml2;
    
    class XMLPrintText : public XMLVisitor
    {
    public:
       virtual bool Visit (const XMLText & txt) override
       {
          std::cout << txt .Value();
          return true;
       }
    };
    
    int main()
    {
       XMLDocument doc;
       doc.Parse ("<div>The quick brown <b>fox</b> jumps over the <i>lazy</i> dog.</div>");
       auto div = doc .FirstChildElement();
       XMLPrintText prt;
       div -> Accept (&prt);
       return 0;
    }