libxml++ TextReader; Skipping nodes

I'm using libxml++ to parse a rather large XML-file and therefore can't use DOM.

Say I have a XML-file like:

<?xml version="1.0"?>

<root>

  <book name="book1">
    <chapter name="chapter1">
      #Pages
    </chapter>
    <chapter name="chapter2">
      #Pages
    </chapter>
  </book>

  <book name="book2">
    <chapter name="chapter1">
      #Pages
    </chapter>
    <chapter name="chapter2">
      #Pages
    </chapter>
  </book>

  <book name="book3">
    <chapter name="chapter1">
    </chapter>
      #Pages
    <chapter name="chapter2">
      #Pages
    </chapter>
  </book>

</root>

Is there a way to loop over all books without having to deal with the nested nodes using TextReader? Is it possible with SAX parsers in general?

EDIT: Moved solution to answer.

Solution

I possibly found (partial) solutions.

Whereas read() reads the very next node and therefore moves into 'deeper' layers, next() jumps to the next node of the current depth. Calling read() two times moves the reader to the opening tag of the first book (depth 1). Calling next() now causes the reader to jump to the next node with a depth of 1, in this case the closing tag. One can now loop over all books by calling next(), as it will return false if there are no more nodes with depth 1.

Unfortunately, there is no option to move the reader up the tree, so if you are calling read() inside the loop and move to a deeper layer, next() will jump to the next node on this layer, so this might not be a satisfying answer in most cases.

Another way would be to call get_current_node() on the reader and then use get_children() to retrieve a list of direct child nodes. In this example one could call read() to move the reader to the root node, then call get_current_node() and get_children respectively and iterate over the resulting list of 'book' nodes.

This only seems to work for small files, as calling get_children() a node with many child-nodes may result in shortened lists, with only a fraction of all child-nodes displayed.

A possible workaround I found is to navigate to the desired depth (as described above), loop over the nodes in this depth by calling next() and after each loop, initialize a new Node-Object by calling expand() on the TextReader, which expands the current node and all it's subtrees. This way you can work on the subtree by accessing the new node, without altering the TextReader-Object.

However, be careful. The C++-Wrapper of the new node will not be deleted, unless you call free_wrapper().

From the Documentation:

The C++ wrappers are not deleted. Using this method (expand()) causes memory leaks, unless you call xmlpp::Node::free_wrappers(), which is not intended to be called by the application.

Note that this is from my own observations, as the function-documentations are very sparse or incomplete.