Search code examples
javaxml

Java & XML : clarification on using getNextSibling() vs getChildNode()


Working with a Java/XML tutorial and have questions with regards to the getNextSibling() and getFirstChild().

At times I am able to follow along and determine which calls are needed but find myself stumbling when it comes to these 2 calls and the expected results.

Here is the data being used.

<?xml version="1.0" encoding="UTF-8"?>
<AllStorage>
  <NorthAmerica>
      <EastCoast>
        <DeliveryLocations>
            <Location>North East </Location>
            <Item1>Full</Item1>
            <Item2>Empty</Item2>
        </DeliveryLocations>
      </EastCoast>
  </NorthAmerica>
</AllStorage>

The following is the code being used.

Within the code are my comments describing what I am seeing as the code progresses.

There are 3 questions embedded within the code at different locations where I seem to get tripped up.

import java.io.File;
import java.io.IOException;
//from w  ww.  j  a v  a  2  s . co m
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class XMLFromjava2  
{
    public static void main(String[] args) throws IOException, ParserConfigurationException, org.xml.sax.SAXException 
    {

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setIgnoringComments(true);
        factory.setCoalescing(true); 
        factory.setNamespaceAware(false); 
        factory.setValidating(false); 
    
        DocumentBuilder parser = factory.newDocumentBuilder();
    
        Document document = parser.parse(new File("C:\\Downloads\\DummyData2.xml"));
    
        NodeList locations = document.getElementsByTagName("DeliveryLocations");   ///Starting point is the <DeliveryLocations> tag
        int numLocations = locations.getLength();
        System.out.println("There are " + numLocations + " locations.");
    
        for (int i = 0; i < numLocations; i++)                  ///Iterates through all DeliveryLocations tag (Only 1 in this example.)
        {                                                               
              Element section = (Element) locations.item(i);    
              
              System.out.println("Node : "  +  section.getNodeName());   ///Prints out the Location Tag.
              
              Node textNode = section.getFirstChild();      ////First Child is the Text Node which is empty (not NULL)
              
              if (textNode.getNodeValue() != null && textNode.getNodeValue().trim().length() > 0)  ////QUESTION 1 : Why is the Text Node empty and not null?????  
              {
                  System.out.println("     Text Node : " + textNode.getNodeValue());  ///If there was text for the <DeliveryLocations> tag it would appear here.
              }
              else
                  System.out.println("     Text Node : <null> or <empty>");  ///This is what is printed
              
              ////Going into the LOOP as a Text Node.
              while (textNode != null && textNode.getNodeType() != Node.ELEMENT_NODE)
              {
                  System.out.println ("     Before getNextSibling() NodeType : " + textNode.getNodeType()); ////Text Node
                  textNode = textNode.getNextSibling();        /////QUESTION 2 :  Why did getNextSibling() work shouldn't it require getNextChild()?????
                  System.out.println ("     After getNextSibling() NodeType : " + textNode.getNodeType());   ///Element Node   <Location>  
                  System.out.println("          Text Node : " + textNode.getNodeName());
              }
        
              if (textNode != null)
              {
                System.out.println("Data : " + textNode.getFirstChild().getNodeValue());    /////The firstChild is the Text Node and that value
                                                                                                ////is the actual data "North East"
                
                System.out.println("Confirm which Node : " + textNode.getNodeName());     ///This confirms still on Location Node.
                
                System.out.println("2nd print : " + textNode.getNextSibling().getNodeName()); ///QUESTION 3 : Why does this get a Text Node??? 
                                                                                                ///Wouldn't a getChild() get the Text node 
                                                                                                ///If the Node is Location then shouldn't its Sibling be Item1??
              }
        }
    }
}

From the above questions, it pertains to areas were I am expecting a getSibling and not a getChild and vice versa.

Are you able to help clarify the confusion?


Solution

  • QUESTION 1 : Why is the Text Node empty and not null?????

    The text node is not empty, it is blank, i.e. it contains the whitespace characters between the <DeliveryLocations> and <Location> tags, which is a line terminator (\r\n pair?) and 12 space characters.

    Your code blindly assumes there is always a text node there. There might not be, so you should always check the node type.

    QUESTION 2 : Why did getNextSibling() work shouldn't it require getNextChild()?????

    There is no getNextChild() method. Remember, you're calling getFirstChild() on the parent node (element), so from that nodes perspective, the call returns a child. You then call getNextSibling() on a child node, so from that nodes perspective, the call returns a sibling, i.e. another child of the same parent. The method names are consistent with the node they are called on:

    • getParentNode() - Walk up to the parent in the hierarchy
    • getFirstChild() / getLastChild() - Walk down to the child in the hierarchy
    • getNextSibling() / getPreviousSibling() - Walk sideways between nodes with the same parent

    QUESTION 3 : Why does this get a Text Node???

    Because there is whitespace between the </Location> and <Item1> tags.

    The nodes of the DOM-tree for <DeliveryLocations> are:

    
    ┌───────────────────┐                    ┌───────────────┐
    │      ELEMENT      │ →→→ firstChild →→→ │   TEXT NODE   │
    │ DeliveryLocations │                    │ <whitespaces> │
    └───────────────────┘                    └───────────────┘
         ↓         ↑                  nextSibling ↓     ↑ previousSibling
         ↓         ↑                           ┌───────────┐                              ┌───────────────┐
         ↓         ↑←←← parentNode ←←←←←←←←←←← │  ELEMENT  │ →→→ firstChild/lastChild →→→ │   TEXT NODE   │
         ↓         ↑                           │ Location  │ ←←←←←←←← parentNode ←←←←←←←← │ "North East " │
         ↓         ↑                           └───────────┘                              └───────────────┘
         ↓         ↑                  nextSibling ↓     ↑ previousSibling
         ↓         ↑                         ┌───────────────┐
         ↓         ↑←←← parentNode ←←←←←←←←← │   TEXT NODE   │
         ↓         ↑                         │ <whitespaces> │
         ↓         ↑                         └───────────────┘
         ↓         ↑                  nextSibling ↓     ↑ previousSibling
         ↓         ↑                            ┌─────────┐                              ┌───────────┐
         ↓         ↑←←← parentNode ←←←←←←←←←←←← │ ELEMENT │ →→→ firstChild/lastChild →→→ │ TEXT NODE │
         ↓         ↑                            │  Item1  │ ←←←←←←←← parentNode ←←←←←←←← │  "Full"   │
         ↓         ↑                            └─────────┘                              └───────────┘
         ↓         ↑                  nextSibling ↓     ↑ previousSibling
         ↓         ↑                         ┌───────────────┐
         ↓         ↑←←← parentNode ←←←←←←←←← │   TEXT NODE   │
         ↓         ↑                         │ <whitespaces> │
         ↓         ↑                         └───────────────┘
         ↓         ↑                  nextSibling ↓     ↑ previousSibling
         ↓         ↑                            ┌─────────┐                              ┌───────────┐
         ↓         ↑←←← parentNode ←←←←←←←←←←←← │ ELEMENT │ →→→ firstChild/lastChild →→→ │ TEXT NODE │
         ↓         ↑                            │  Item2  │ ←←←←←←←← parentNode ←←←←←←←← │  "Empty"  │
         ↓         ↑                            └─────────┘                              └───────────┘
         ↓         ↑                  nextSibling ↓     ↑ previousSibling
         ↓         ↑                         ┌───────────────┐
         ↓         ↑←←← parentNode ←←←←←←←←← │   TEXT NODE   │
         ↓                                   │ <whitespaces> │
         →→→→ lastChild →→→→→→→→→→→→→→→→→→→→ └───────────────┘