Search code examples
javaxml-parsingsax

Java SAX DefaultHandler parsing get tag value with several tags with the same name


I have this input XML:

<REQUEST>
    <ELEMENT>element inside request</ELEMENT>
    <NUMBER>250</NUMBER>
    <LIST>
          <ELEMENT>element inside list</ELEMENT>
          <LETTER>A</LETTER>
    </LIST>
    <OTHER1>other 1</OTHER1>
</REQUEST>

I'm extending DefaultHandler in a class that I use it to get the values.

This is my class:

public class MyHandler extends DefaultHandler {
    private String elementName = null;
    private boolean bElement = false;

    private String element = null;

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        if (bElement){
            elementName = new String(ch, start, length);
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        if("ELEMENT".equalsIgnoreCase(qName) && elementName == null){
            element = elementName;
        }
        bElement = false;
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        if(qName.equalsIgnoreCase("ELEMENT")){
            bElement = true;
        }
    }

    public String getElement() {
        return this.element;
    }

}

And I have this logic to get the element value (request is a HttpServletRequest):

ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
int n = 0;
while ((n = request.getInputStream().read(buf)) >= 0) {
    baos.write(buf, 0, n);
}
byte[] content = baos.toByteArray();
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
MyHandler handler = new MyHandler();
saxParser.parse(new ByteArrayInputStream(content), handler);
return handler.getElement();

When I send the XML above, we get the value inside REQUEST->LIST->ELEMENT, so I get the String:

element inside list

But I want to get the String inside first ELEMENT tag

element inside request

What I need to complete the code in MyHandler class to get the ELEMENT value inside REQUEST->ELEMENT and no get the other ELEMENT.


Solution

  • You will need to keep track of the full position in the XML.

    NB - This is completely untested code.

    public class MyHandler extends DefaultHandler {
    
        // Use a DEQUE to track the current position inthe xml.
        private Deque<String> position = new ArrayDeque<>();
        // My data.
        private StringBuilder data = new StringBuilder();
    
        private String element = null;
    
        @Override
        public void characters(char[] ch, int start, int length) throws SAXException {
            if (match()) {
                // Append to my buffer.
                data.append(ch, start, length);
            }
        }
    
        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {
            // Ending a tag - pop it from end.
            position.removeLast();
        }
    
        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
            // Starting a tag - push it at the end.
            position.addLast(qName);
        }
    
        public String getElement() {
            return this.element;
        }
    
        // Specifically looking for the REQUEST.ELEMENT - not the REQUEST.LIST.ELEMENT
        private final String[] lookingFor = {"REQUEST", "ELEMENT"};
    
        private boolean match() {
            // Must be that deep.
            if (position.size() == lookingFor.length) {
                // Must match.
                Iterator<String> match = position.iterator();
                for (int i = 0; i < lookingFor.length; i++) {
                    // Match?
                    if (!match.next().equals(lookingFor[i])) {
                        return false;
                    }
                }
            } else {
                // Wrong depth.
                return false;
            }
            // No mismatch -> match!
            return true;
        }
    
    }
    

    Also - as an aside, you should always append when characters is called.