Search code examples
javaxmlxml-parsingdomparser

Parsing the multilevel XML File using java (DOM Parser)


Here is example of my XML file :

    ?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="xslt/options.xsl"?>
    <options>
      <version>0001</version>
      <title>ConfigData</title>
      <category>
        <name>GConfigData</name>
        <option>
          <name>String_name</name>
          <value>350.16.01a</value>
          <control>
            <type>TextBox2</type>
            <caption> String Name</caption>
            <left>0</left>
            <top>0</top>
            <width>2600</width>
            <height>900</height>
            <font>Courier</font>
            <scroll_bar>0</scroll_bar>
          </control>
        </option>
        <option>
          <name>FileID</name>
          <value>1601</value>
          <control>
            <type>TextBox2</type>
            <caption>file version</caption>
            <left>0</left>
            <top>900</top>
            <width>2600</width>
            <height>900</height>
            <font>Courier</font>
            <scroll_bar>0</scroll_bar>
          </control>
        </option>
        <option>
          <name>systemID</name>
          <value>0</value>
          <control>
            <type>TextBox2</type>
            <caption>System ID</caption>
            <left>0</left>
            <top>1800</top>
            <width>2400</width>
            <height>900</height>
            <font>Courier</font>
            <scroll_bar>0</scroll_bar>
          </control>
        </option>
        <option>
          <name>SyncTime</name>
          <value>2</value>
          <control>
            <type>TextBox2</type>
            <caption>Sync Time</caption>
            <left>0</left>
            <top>2700</top>
            <width>2400</width>
            <height>900</height>
            <font>Courier</font>
            <scroll_bar>0</scroll_bar>
          </control>
        </option>
        <option>
          <name>UseServer</name>
          <value>0</value>
          <control>
            <type>TextBox2</type>
            <caption>Use Server</caption>
            <left>0</left>
            <top>3600</top>
            <width>2400</width>
            <height>900</height>
            <font>Courier</font>
            <scroll_bar>0</scroll_bar>
          </control>
        </option>
        <option>
          <name>CommType</name>
          <value>0</value>
          <control>
            <type>FixedList</type>
            <caption>Comm Type</caption>
            <left>0</left>
            <top>4500</top>
            <width>2400</width>
            <height>900</height>
            <list>                                              
              <item>
                <text>Parellel</text>
                <value>0</value>
              </item>
              <item>
                <text>Simple Serial</text>
                <value>1</value>
              </item>
              <item>
                <text>Complex Serial</text>
                <value>2</value>
              </item>
            </list>
          </control>
        </option>
        <option>
          <name>YYBasis</name>
          <value>70</value>
          <control>
            <type>TextBox2</type>
            <caption>Set YY Basis</caption>
            <left>0</left>
            <top>5400</top>
            <width>2400</width>
            <height>900</height>
            <font>Courier</font>
            <scroll_bar>0</scroll_bar>
          </control>
        </option>
        <option>
          <name>Separator</name>
          <value>46</value>
          <control>
            <type>TextBox2</type>
            <caption>Separator</caption>
            <left>0</left>
            <top>6300</top>
            <width>2400</width>
            <height>900</height>
            <font>Courier</font>
            <scroll_bar>0</scroll_bar>
          </control>
        </option>
        <option>
          <name>WholeSeparator</name>
          <value>44</value>
          <control>
            <type>TextBox2</type>
            <caption>Whole Separator</caption>
            <left>0</left>
            <top>7200</top>
            <width>2400</width>
            <height>900</height>
            <font>Courier</font>
            <scroll_bar>0</scroll_bar>
          </control>
        </option>
        <option>
          <name>DateFormat</name>
          <value>0</value>
          <control>
            <type>FixedList</type>
            <caption>Date Format</caption>
            <left>2600</left>
            <top>0</top>
            <width>2400</width>
            <height>900</height>
            <list>
              <item>
                <text>MM/DD/YY</text>
                <value>0</value>
              </item>
              <item>
                <text>MM/DD/YYYY</text>
                <value>1</value>
              </item>
              <item>
                <text>DD/MM/YY</text>
                <value>2</value>
              </item>
              <item>
                <text>DD/MM/YYYY</text>
                <value>3</value>
              </item>
              <item>
                <text>YY/MM/DD</text>
                <value>4</value>
              </item>
              <item>
                <text>MM.DD.YY</text>
                <value>6</value>
              </item>
              <item>
                <text>MM.DD.YYYY</text>
                <value>7</value>
              </item>
              <item>
                <text>DD.MM.YY</text>
                <value>8</value>
              </item>
              <item>
                <text>DD.MM.YYYY</text>
                <value>9</value>
              </item>
              <item>
                <text>YY.MM.DD</text>
                <value>10</value>
              </item>
              <item>
                <text>YYYY.MM.DD</text>
                <value>11</value>
              </item>
            </list>
          </control>
        </option>
      </category>
    </options>

I wrote the java code to parse the name , caption and value of each option. Here is code :

public class XMLParsingSingleFileFinal {



    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException
       {
          //Get Document Builder
          DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
          DocumentBuilder builder = factory.newDocumentBuilder();

          //Build Document
          Document document = builder.parse(new File("options.xml"));

          //Normalize the XML Structure; It's just too important !!
          document.getDocumentElement().normalize();
          XPath xPath =  XPathFactory.newInstance().newXPath();

          //Here comes the root node
          Element root = document.getDocumentElement();
          System.out.println(root.getNodeName());

          //Get all options
          NodeList nList = document.getElementsByTagName("options");
          System.out.println("Total Options = " + nList.getLength());
          System.out.println("TITLE = " + document.getElementsByTagName("title").item(0).getTextContent());
          System.out.println("VERSION = " + document.getElementsByTagName("version").item(0).getTextContent());

          System.out.println("===================================");

          //Get all category
          NodeList nList1 = document.getElementsByTagName("category");
          System.out.println("Total Category inside options = " + nList1.getLength());
          //int count1 = nList1.getLength();


          for (int temp = 0; temp < nList1.getLength(); temp++)
          {
             Node node = nList1.item(temp);
             if (node.getNodeType() == Node.ELEMENT_NODE)
             {
                 Element mElement = (Element) node;
                 System.out.println("\nCategory Name = " + mElement.getElementsByTagName("name").item(0).getTextContent());
                 NodeList nList2 = mElement.getElementsByTagName("option");
                 System.out.println("option inside category = " + nList2.getLength());
                 System.out.println("\n\t");
                // int count = nList2.getLength();


                 for (int temp1 = 0; temp1 < nList2.getLength()/2; temp1++) 
                {

                    Node nNode = nList2.item(temp1);
                    if (nNode.getNodeType() == Node.ELEMENT_NODE)
                    {

                    Element nElement = (Element) nNode;

                 System.out.println("\tOption Name = " + mElement.getElementsByTagName("name").item(temp1+1).getTextContent());
                 System.out.println("\t\tCaption Name = " + mElement.getElementsByTagName("caption").item(temp1).getTextContent());

                 System.out.println("\t\tValue = " + mElement.getElementsByTagName("value").item(temp1).getTextContent());



                 System.out.println("\n\t");

            }

              }  
                 System.out.println("\n\t");
             }   
          }   
       }
}

My main aim is to parse the "value" of the node "option".

As you can see that in the "option" - commtype , there is attribute "item" which also have childnode "value".

So while parsing , Till the Option name "commtype" it is producing the correct data. Moving on to next option its taking the "value" of childnode "item" from previous option.

Example:(Parse Result)

options
Total Options = 1
TITLE = ConfigData
VERSION = 0001
===================================
Total Category inside options = 23

Category Name = GConfigData
option inside category = 38


    Option Name = String_name
        Caption Name = String Name
        Value = 350.16.01a


    Option Name = FileID
        Caption Name =  file version
        Value = 1601


    Option Name = SystemID
        Caption Name = System ID
        Value = 0


    Option Name = SyncTime
        Caption Name = Sync Time
        Value = 2


    Option Name = UseServer
        Caption Name = Use Server
        Value = 0


    Option Name = CommType
        Caption Name = Comm Type
        Value = 0


    Option Name = YYBasis
        Caption Name = Set YY Basis
        Value = 0        /*(Here the value should be 70 as in XML file , But its taking the value of option(Name:CommType)/control/list/item(text:parellel)/value )*/


    Option Name = Separator
        Caption Name =  Separator
        Value = 1       /*(Here the value should be 46 as in XML file , But its taking the value of option(Name:CommType)/control/list/item(text:simple serial)/value)*/


    Option Name =WholeSeparator
        Caption Name = Whole Separator
        Value = 2     /*(Here the value should be 44 as in XML file , But its taking the value of option(Name:CommType)/control/list/item(text:complex serial)/value)*/


    Option Name = DateFormat
        Caption Name = Date Format
        Value = 70    //(Value should be 0)

After the Option Name: CommType , the value of each option is parsed wrongly.

What can be the solution of this? I am new to java as well as XML.

PS: This is my first question on this forum.I apologize of any spelling mistake and if the way of questioning is wrong. Please try to help me in possible ways.


Solution

  • Don't use indexes or offsets for nodes (hardcoding is anti-pattern), it makes your code fragile

    SAXReader reader = new SAXReader();
    Document document = reader.read(file);
    List<Node> nodes = document.selectNodes("/options/category/option");
    
    for (Node node : nodes) {
        System.out.println("caption: " + node.selectSingleNode("control/caption").getText());
        System.out.println("value : " + node.selectSingleNode("value").getText());
    }
    

    example output (cutted):

    caption:  String Name
    value : 350.16.01a
    caption: file version
    value : 1601
    caption: System ID
    value : 0
    

    dependencies required:

    <dependency>
        <groupId>jaxen</groupId>
        <artifactId>jaxen</artifactId>
        <version>1.1.6</version>
    </dependency>
    
    <dependency>
        <groupId>dom4j</groupId>
        <artifactId>dom4j</artifactId>
        <version>1.6.1</version>
    </dependency>