Search code examples
c++xmltinyxml2

Find XML element 'start' and 'end' using tinyxml2 (or other C++ XML library)


I am trying to iterate through the elements of an XML document, and firing events on 'start' elements and 'end' elements.

This is pretty straight-forward in using Python's lxml module, and there is even another question on SO regarding this:

Using Python's xml.etree to find element start and end character offsets

#!/usr/bin/python
import re, sys
from lxml import etree
from StringIO import StringIO

dtd = etree.DTD (open (sys.argv [1], "r"))
xml = etree.XML (open (sys.argv [2], "r").read ())

result = dtd.validate (xml)
for error in dtd.error_log.filter_from_errors():
    print(error.message)
    print(error.line)
    print(error.column)

if result == True :
    for event, elem in etree.iterwalk (xml, events=('start', 'end')) :
        if event == 'start' :
            print 'starting element:', elem.tag
        elif event == 'end' :
            print 'ending element:', elem.tag
            if elem is not xml :
                print elem.tail

I would like to do essentially the same thing using the tinyxml2 C++ XML library, but I have not had any luck with this so far [specifically finding closing tags].

I prefer tinyxml2 as it is 'tiny', but I am open to other C++ XML libs if they can achieve this end (more easily).

If there is a better way to fire events on 'end tags' I am open to that as well.


Solution

  • tinyXml2 offers a very basic(and very fast) implementation to parser and navigate inside a xml structure. RapidXML is likely faster but it has the same basic behavior.

    I suggest if it is enterily mandatory catching event (start and end) use Xerces because SAXParser allows catching when the parser is inside an xml element and when it exits from the element also. The great inconvenience, in my humble opinion, is the compilation under MSVC, it is damned tedious because you must compile the apache commons in C++, but under gcc environment I think is trivial in comparission. GoodLuck!