Search code examples
javaxmlxml-parsingjdom-2

how to Ignore Commented content while parsing XML using JDOM2


I am facing some problem while parsing my xml using JDOM parser.It gives me the commented lines when I am trying to retrieve the content.Is there a way so that we can ignore these commented lines.

Java Code:

SAXBuilder jdomBuilder = new SAXBuilder();
    // jdomDocument is the JDOM2 Object
    Document jdomDocument = jdomBuilder.build("C:/manu/WebservicesWS/DynamicXmlParse/src/PO_XML.xml");
    // The root element is the root of the document. we print its name
    System.out.println(jdomDocument.getRootElement().getName()); // prints
                                                                    // "rss"
    Element rss = jdomDocument.getRootElement();
    System.out.println(rss.getNamespaceURI());
    List<Element> rssChildren = rss.getChildren();
    // getElement(rssChildren);
    for (int i = 0; i < rssChildren.size(); i++) {
        Element rssChild = rssChildren.get(i);
        System.out.println(rssChild.getName());// prints 'title' and 'link'
        List<Content> rssContents = rssChild.getContent();
        for (int j = 0; j < rssContents.size(); j++) {
            Content content = rssContents.get(j);
            System.out.println(content.getValue());
        }
    }

XML Structure

<interchange-control-header>
    <control-number>2</control-number>
    <sender-id>ZZ:IQAAOBUYER7</sender-id>
    <receiver-id>ZZ:33347456972</receiver-id>
    <!--sender-id>ZZ:IQAAOBUYER2</sender-id>
    <receiver-id>ZZ:IQAAOSUPPLIER2</receiver-id>        
    <sender-id>IQAOrionBuyer</sender-id>
    <receiver-id>IQAOrionSupplier</receiver-id-->           
    <date-time>2012-06-29T09:30:47-05:00</date-time>
    <control-version>1</control-version>
    <usage-indicator>T</usage-indicator>
    <is-copy>0</is-copy>
</interchange-control-header>

current Output

interchange-control-header
2
ZZ:IQAAOBUYER7
ZZ:33347456972
sender-id>ZZ:IQAAOBUYER2</sender-id>
    <receiver-id>ZZ:IQAAOSUPPLIER2</receiver-id>        
    <sender-id>IQAOrionBuyer</sender-id>
    <receiver-id>IQAOrionSupplier</receiver-id
2012-06-29T09:30:47-05:00
1
T
0

required Output:

interchange-control-header
2
ZZ:IQAAOBUYER7
ZZ:33347456972
2012-06-29T09:30:47-05:00
1
T
0

Solution

  • Comments are considered to be an identifiable part of an XML document, along with the more obvious things like Elements. Other content to be aware of are Processing Instructions, Text, and Entity References.

    When you call getContent on the rssChild Element, you get the Comment content, and it's value is the text inside that content.

    It appears you just want to print out the text content of each child element, not of all content.

    The simple way to get all child elements is to use the getChildren() method (instead of the getContent). You are already using the getChildren in other places, so I am not sure why you forgot to use it here....

    Additionally, you can simplify the loops to be for-each style... this code:

    List<Element> rssChildren = rss.getChildren();
    // getElement(rssChildren);
    for (int i = 0; i < rssChildren.size(); i++) {
        Element rssChild = rssChildren.get(i);
        System.out.println(rssChild.getName());// prints 'title' and 'link'
        List<Content> rssContents = rssChild.getContent();
        for (int j = 0; j < rssContents.size(); j++) {
            Content content = rssContents.get(j);
            System.out.println(content.getValue());
        }
    }
    

    could be:

    for (Element rssChild : rss.getChildren()) {
        System.out.println(rssChild.getName());// prints 'title' and 'link'
        for (Element subRss : rssChild.getChildren()) {
            System.out.println(subRss.getValue());
        }
    }