Search code examples
xmlxerces-c

xerces-c: DOM xml parsing


I have a question about XML parsing. I was experimenting with a sample program and changed it up a bit to try to understand how parsing works however, I've encountered an output I dont quite understand and hope that some of you can shed some light onto what may be going on.

This is my xml file:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root xmlns="http://www.test.com">
   <ApplicationSettings>
           <option_a>"10"</option_a> 
           <option_b>"24"</option_b>
   </ApplicationSettings>
</root>

I inserted debug statements throughout my program to try to understand what goes on when function calls such as getChildNodes() processes as it is called. This is the output I received:

Parsing xml file...
Processing Root...
Processing children with getChildNodes()...
>>>>>>>>>>> Loop child 0: Node name is: #text
>>>>>>>>>>> Loop child 1: Node name is: ApplicationSettings
= ApplicationSettings processing children with getChildNodes()...
***** iter 0 child name is #text
***** iter 1 child name is option_a
***** iter 2 child name is #text
***** iter 3 child name is option_b
***** iter 4 child name is #text
>>>>>>>>>>> Loop: 2 Node name is: #text

From the output, I can easily infer it correctly parsed my xml file. However, I noticed the program also detected extra nodes with the name #text (printed out using the getNodeName() function). My question is, what do those #text refer to and why do they appear periodically throughout the loops?

Thanks!


Solution

  • Those #text nodes in your example refer to the whitespace between tags. For example here

    <root xmlns="http://www.test.com">
       <ApplicationSettings>
    

    there are a line feed and four spaces between ...com"> and <App....

    You can try to parse the following to see what happens:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <root xmlns="http://www.test.com"><ApplicationSettings><option_a>"10"</option_a><option_b>"24"</option_b></ApplicationSettings></root>