Search code examples
javaxml-parsingxerces

Access inner XML data with Java and Xerces


I am trying to parse a XML document using Xerces, but I cant seem to access the data within the elements, below is a sample XML document;

<sample>
<block>
    <name>tom</name>
    <age>44</age>
    <car>BMW</car>
</block>
<block>
    <name>Jenny</name>
    <age>23</age>
    <car>Ford</car>
</block>
</sample>

SO far the only output I can produce is;

Sample
    block
      name
        age
          car
    block
      name
        age
          car

Which is just a list of the node names. I have tried node.getValue(), but this just returns null, so im guessing thats wrong!

How can I access the data inside? Here is what is the basics so far;

public static void display(String file) {
    try{
        DOMParser parser = new DOMParser();
        parser.parse(file);
        Document doc = parser.getDocument();
        read(doc);
    }
        catch(Exception e){e.printStackTrace(System.err);}
}


public static void read(Node node) {
    if(node == null) {return;}
        int type = node.getNodeType();
        //System.out.print((node));
        switch (type) {
        case Node.DOCUMENT_NODE: {
            display_all(((Document)node).getDocumentElement());
            break;
        }

         case Node.TEXT_NODE:

          break;
        case Node.ELEMENT_NODE: {

            System.out.println(node.getNodeName());

            NodeList child = node.getChildNodes();
            if(child != null) {
                int length = child.getLength();
                for (int i = 0; i < length ; i++) {
                        display_all(child.item(i));
                }
        }

        break;


        }
        }
}

Solution

  • getNodeValue() returns the value of a text node, which you currently skip over.

     public static void read(Node node) {
        if (node == null) {
            return;
        }
    
        int type = node.getNodeType();
        switch (type) {
        case Node.DOCUMENT_NODE: {
            System.out.println("Doc node; name: " + node.getNodeName());
            read(((Document) node).getDocumentElement());
            break;
        }
    
        case Node.TEXT_NODE:
            System.out.println("Text node; value: " + node.getNodeValue().replaceAll("\\s", ""));
            break;
    
        case Node.ELEMENT_NODE: {
            System.out.println("Element node; name: " + node.getNodeName());
            NodeList children = node.getChildNodes();
            int length = children.getLength();
            for (int i = 0; i < length; i++) {
                read(children.item(i));
            }
            break;
        }
        }
    }
    

    I think where you might be getting confused is how XML is actually structured, and what the children of something like this is:

    <element>
      <child_element>foo</child_element>
    </element>
    

    The above code snippet may help explain.

    It's also why things like dom4j, JAXB, XPath, etc. make things much easier.