Search code examples
javaxml-parsingstax

How to parse xml and get the corresponding values using Stax Iterator ?


I would like to parse xml node using STAX Iterator API and get the values of each id node. In the below code, how do I get the corresponding value of id type=id2 or id3. How can I do this?

<entity>
   <id type="id1">8500123</id>
   <id type="id2">8500124</id>
   <id type="id3">8500125</id>
   <link idType="someId">99369</link>
 </entity>

STAX Iterator API code below;

XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileInputStream(fileName));
    while (xmlEventReader.hasNext()) {
        XMLEvent xmlEvent = xmlEventReader.nextEvent();
        if (xmlEvent.isStartElement()) {
            StartElement startElement = xmlEvent.asStartElement();
            if (startElement.getName().getLocalPart().equals("entity")) {
                XMLEvent xmlEvent2 = xmlEventReader.nextEvent();//has to forgo this bcoz it always return a new line.
                XMLEvent xmlEvent3 = xmlEventReader.nextEvent();
                if (xmlEvent3.isStartElement()) {
                    StartElement startElement2 = xmlEvent3.asStartElement();
                    if (startElement2.getName().getLocalPart().equals("id")) {
                        connector = new Connector();
                        Attribute idAttr = startElement2.getAttributeByName(new QName("type"));
                        if(idAttr.getName().equals("id1")){
                            connector.setId1(idAttr.getValue());
                        }
                    }
                }
            }
        }
    } 

Solution

  • Since the question is old there is probably no longer an issue, but I was just trying to do the same thing. The sample code was almost there; the missing step was to check for an event type of XMLStreamConstants.CHARACTERS which corresponds to either:

    • The data between an opening and closing tag.
    • Whitespace between tags.

    So in your case you want to extract the data only if all of these conditions are met:

    • The event type being processed is XMLStreamConstants.CHARACTERS (in which case EventType.isCharacters() returns true).
    • The immediately preceding event processed was of type XMLStreamConstants.START_ELEMENT.
    • The value of the type attribute of that preceding start element was "id2" or "id3".

    It's possible to do that by tweaking your existing code, but a cleaner and more generic approach is to iteratively process the events returned by XMLEventReader using a case statement. To get the value of the data between a start tag and end tag:

    Characters characters = xmlEvent.asCharacters();
    String data = characters.getData();
    

    Here's a working example, where the file sample.xml contains the data in the OP:

    package pkg;
    
    import java.io.FileReader;
    import java.io.IOException;
    import java.io.Reader;
    
    import javax.xml.namespace.QName;
    import javax.xml.stream.XMLEventReader;
    import javax.xml.stream.XMLInputFactory;
    import javax.xml.stream.XMLStreamConstants;
    import javax.xml.stream.XMLStreamException;
    import javax.xml.stream.events.Attribute;
    import javax.xml.stream.events.Characters;
    import javax.xml.stream.events.EndElement;
    import javax.xml.stream.events.StartElement;
    import javax.xml.stream.events.XMLEvent;
    
    public class StaxDemo {
    
        public static void main(String[] args) throws XMLStreamException, IOException {
    
            try (Reader reader = new FileReader("sample.xml");) {
                XMLInputFactory xmlInputFactory = XMLInputFactory.newFactory();
                XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(reader);
                parseXml(xmlEventReader);
            }
        }
    
        static void parseXml(XMLEventReader xmlEventReader) throws XMLStreamException {
    
            String typeValue = null;
    
            while (xmlEventReader.hasNext()) {
                XMLEvent xmlEvent = xmlEventReader.nextEvent();
                switch (xmlEvent.getEventType()) {
    
                case XMLStreamConstants.START_DOCUMENT:
                    System.out.println("XMLEvent.START_DOCUMENT");
                    break;
    
                case XMLStreamConstants.START_ELEMENT:
                    StartElement startElement = xmlEvent.asStartElement();
                    Attribute typeAttribute = startElement.getAttributeByName(new QName("type"));
                    if (typeAttribute != null) {
                        typeValue = typeAttribute.getValue();
                    }
                    System.out.println("XMLEvent.START_ELEMENT: <" + startElement.getName() + "> " + "type=" + typeValue);
                    break;
    
                case XMLStreamConstants.CHARACTERS:
                    Characters characters = xmlEvent.asCharacters();
                    if ((typeValue != null)) { // Non-null if preceding event was for START_ELEMENT.
                        if ((typeValue.equals("id2")) || (typeValue.equals("id3"))) {
                            String data = characters.getData();
                            System.out.println("XMLEvent.CHARACTERS:    data=[" + data + "]");
                        }
                        typeValue = null;
                    }
                    break;
    
                case XMLStreamConstants.END_ELEMENT:
                    EndElement endElement = xmlEvent.asEndElement();
                    System.out.println("XMLEvent.END_ELEMENT:   </" + endElement.getName() + ">");
                    break;
    
                case XMLStreamConstants.END_DOCUMENT:
                    System.out.println("XMLEvent.END_DOCUMENT");
                    break;
    
                default:
                    System.out.println("case default: Event Type = " + xmlEvent.getEventType());
                    break;
                }
            }
        }
    }
    

    I added a few println() calls just to clarify how the file is processed by XMLEventReader. Here's the output:

    XMLEvent.START_DOCUMENT
    XMLEvent.START_ELEMENT: <entity> type=null
    XMLEvent.START_ELEMENT: <id> type=id1
    XMLEvent.END_ELEMENT:   </id>
    XMLEvent.START_ELEMENT: <id> type=id2
    XMLEvent.CHARACTERS:    data=[z8500124]
    XMLEvent.END_ELEMENT:   </id>
    XMLEvent.START_ELEMENT: <id> type=id3
    XMLEvent.CHARACTERS:    data=[z8500125]
    XMLEvent.END_ELEMENT:   </id>
    XMLEvent.START_ELEMENT: <link> type=null
    XMLEvent.END_ELEMENT:   </link>
    XMLEvent.END_ELEMENT:   </entity>
    XMLEvent.END_DOCUMENT
    

    Oracle provides a tutorial for StAX. While all the basic information is there, I found it a bit disorganized.