Search code examples
c++xmlxerces

Filtering out formatting characters between consecutive XML tags with Xerces C++


I'd appreciate pointers on how to get (non-element) text between tags. For example given the element ABC I'd like to get the text ABC.

Currently, I'm able to use DefaultHandler::(const XMLCh *const chars, const XMLSize_t length) in order to get the characters between two consecutive start or end tags. Unfortunately I'm getting unnecessary newlines and formatting spaces. Between parent tags and child elements. For example in the bit of code below, I'm getting 5 extra formatting characters -- one newline and four spaces:

<Parent>               <!-- Newline here -->
    <Child>XYX</Child> <!-- Four spaces here -->
</Parent>

What is be the best (standard) way of filtering out these formatting characters?


Solution

  • Solved. For posterity's sake, here's how I did it.

    1. Because the desired characters appear between (consecutive start and end) tags that define an element, In the method DefaultHandler::startElement() I store the local name at the start of an element and compare it with next `local name that is encountered.

    2. If the next local name encountered belongs to a new element then the intervening characters must be formatting characters and should be ignored.

    3. If however the next element encountered has the same local name then the intervening characters form the desired string.