Search code examples
javaxmlparsingdom

How to handle missing tag when parsing xml using DOM?


I'm trying to do XML parsing using DOM, but the problem is that the tag is missing in some places. Tell me how to handle this case so that in its absence, some other string is written to the ArrayList (for example, "No data")? I'm new to Java. I did it according to the tutorial

XML of similar content:

<item>
<title>number 1</title>
</item>

<item>
<title>number 2</title>
<enclosure url="https://www.google.com/"/>
</item>

Part of the Java Code:

List<String> title = new ArrayList<>(); 
List<String> enclosure = new ArrayList<>();

NodeList itemNodeList = document.getElementsByTagName("item");
for (int i = 0; i < itemNodeList.getLength(); i++){
    if (itemNodeList.item(i).getNodeType() == Node.ELEMENT_NODE){
        Element itemElement = (Element) itemNodeList.item(i);

        NodeList childNodes = itemElement.getChildNodes();
        for (int j = 0; j < childNodes.getLength(); j++){
            if (childNodes.item(j).getNodeType() == Node.ELEMENT_NODE){
                Element childElement = (Element) childNodes.item(j);
                
                switch (childElement.getNodeName()){
                    case "title":{
                        title.add(childElement.getTextContent());  
                        break;
                    }
                    case "enclosure":{
                        String info = childElement.getAttribute("url");
                        enclosure.add(info); 
                        break;
                    }
                }
            }
        }
    }
}

Solution

  • Tell me how to handle this case so that in its absence, some other string is written to the ArrayList (for example, "No data")?

    If you want defaults for the "title" and "enclosure" values, then set them, on a per-item basis. For example,

            // ...
            Element itemElement = (Element) itemNodeList.item(i);
            String itemTitle = "No title";
            String itemEnclosure = "no data";
            // ...
    

    Then replace those appropriately as you parse the child nodes, instead of immediately updating your lists. For example,

                        // ...
                        case "enclosure":{
                            itemEnclosure = childElement.getAttribute("url");
                            break;
                        }
                        // ...
    

    After you have processed all the children of each item node, you have the appropriate information to add to your lists, whether it was read from child nodes or remains the default:

            // ...
            for (int j = 0; j < childNodes.getLength(); j++){
                // ...
            }
            title.add(itemTitle);
            enclosure.add(itemEnclosure);
            // ...
    

    Be aware, however, that if any of your items contain more than one title or more than one enclosure, then some of those will not be captured. If you want to be able to support multiples like that then you need a different internal representation for your data.