Search code examples
javaxmlsaxsaxparser

Is there a generic way of reading complex XML using SaxParser?


I am using SaxParser to read the large complex XML file. I do not wish to create the model class as I do not know the exact data which will be coming in the XML so I am trying to find if there is a generic way of reading the XML data using some sort of Context.

I have used a similar approach for JSON using the Jackson, which worked very well for me. Since I am new to Sax Parser, I cannot completely understand how to achieve the same. for complex inner values, I am unable to establish a parent-child relationship and I am unable to build relationships between tags and attributes.

Following is the code I have so far:

ContextNode my generic class to store all XML information using the parent-child relationships.

@Getter
@Setter
@ToString
@NoArgsConstructor
public class ContextNode {
    protected String name;
    protected String value;
    protected ArrayList<ContextNode> children = new ArrayList<>();
    protected ContextNode parent;

    //Constructor 1: To store the simple field information.
    public ContextNode(final String name, final String value) {
        this.name = name;
        this.value = value;
    }

    //Constructor 2: To store the complex field which has inner elements.
    public ContextNode(final ContextNode parent, final String name, final String value) {
        this(name, value);
        this.parent = parent;
    }

Following is my method to parse XML using SAX within EventReader.class

public class EventReader{
//Method to read XML events and create pre-hash string from it.
public static void xmlParser(final InputStream xmlStream) {
    final SAXParserFactory factory = SAXParserFactory.newInstance();

    try {
        final SAXParser saxParser = factory.newSAXParser();
        final SaxHandler handler = new SaxHandler();
        saxParser.parse(xmlStream, handler);
    } catch (ParserConfigurationException | SAXException | IOException e) {
        e.printStackTrace();
    }
}
}

Following is my SaxHandler:

import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;

import java.util.HashMap;

public class SaxHandler extends DefaultHandler {

    private final List<String> XML_IGNORE_FIELDS = Arrays.asList("person:personDocument","DocumentBody","DocumentList");
    private final List<String> EVENT_TYPES = Arrays.asList("person");
    private Map<String, String> XML_NAMESPACES = null;
    private ContextNode contextNode = null;
    private StringBuilder currentValue = new StringBuilder();

    @Override
    public void startDocument() {
        ConstantEventInfo.XML_NAMESPACES = new HashMap<>();
    }

    @Override
    public void startElement(final String uri, final String localName, final String qName, final Attributes attributes) {
        //For every new element in XML reset the StringBuilder.
        currentValue.setLength(0);

        if (qName.equalsIgnoreCase("person:personDocument")) {
            // Add the attributes and name-spaces to Map
            for (int att = 0; att < attributes.getLength(); att++) {

                if (attributes.getQName(att).contains(":")) {
                    //Find all Namespaces within the XML Header information and save it to the Map for future use.
                    XML_NAMESPACES.put(attributes.getQName(att).substring(attributes.getQName(att).indexOf(":") + 1), attributes.getValue(att));
                } else {
                    //Find all other attributes within XML and store this information within Map.
                    XML_NAMESPACES.put(attributes.getQName(att), attributes.getValue(att));
                }
            }
        } else if (EVENT_TYPES.contains(qName)) {
            contextNode = new ContextNode("type", qName);
        }
    }

    @Override
    public void characters(char ch[], int start, int length) {
        currentValue.append(ch, start, length);
    }

    @Override
    public void endElement(final String uri, final String localName, final String qName) {
        if (!XML_IGNORE_FIELDS.contains(qName)) {
            if (!EVENT_TYPES.contains(qName)) {
                System.out.println("QName : " + qName + " Value : " + currentValue);
                contextNode.children.add(new ContextNode(qName, currentValue.toString()));
            }
        }
    }

    @Override
    public void endDocument() {
        System.out.println(contextNode.getChildren().toString());
        System.out.println("End of Document");
    }
}

Following is my TestCase which will call the method xmlParser

@Test
public void xmlReader() throws Exception {
    final InputStream xmlStream = getClass().getResourceAsStream("/xmlFileContents.xml");
    EventReader.xmlParser(xmlStream);
}

Following is the XML I need to read using a generic approach:

<?xml version="1.0" ?>
<person:personDocument xmlns:person="https://example.com" schemaVersion="1.2" creationDate="2020-03-03T13:07:51.709Z">
<DocumentBody>
    <DocumentList>
        <Person>
            <bithTime>2020-03-04T11:00:30.000+01:00</bithTime>
            <name>Batman</name>
            <Place>London</Place>
            <hobbies>
                <hobby>painting</hobby>
                <hobby>football</hobby>
            </hobbies>
            <jogging distance="10.3">daily</jogging>
            <purpose2>
                <id>1</id>
                <purpose>Dont know</purpose>
            </purpose2>
        </Person>
    </DocumentList>
</DocumentBody>
</person:personDocument>

Solution

  • Providing the answer as it can be helpful to someone in the future:

    First we need to create a class ContextNode which can hold the information:

    @Getter
    @Setter
    public class ContextNode {
        protected String name;
        protected String value;
        protected ArrayList<ContextNode> attributes = new ArrayList<>();
        protected ArrayList<ContextNode> children = new ArrayList<>();
        protected ContextNode parent;
        protected Map<String, String> namespaces;
    
        public ContextNode(final ContextNode parent, final String name, final String value) {
            this.parent = parent;
            this.name = name;
            this.value = value;
            this.namespaces = parent.namespaces;
        }
       
        public ContextNode(final Map<String, String> namespaces) {
            this.namespaces = namespaces;
        }
    
        public ContextNode(final Map<String, String> namespaces) {
            this.namespaces = namespaces;
        }
    }
    

    Then we can read the XML and store the information in the context node:

    import lombok.Getter;
    import org.xml.sax.Attributes;
    import org.xml.sax.helpers.DefaultHandler;
    
    import java.security.NoSuchAlgorithmException;
    import java.util.*;
    
    public class SaxHandler extends DefaultHandler {
    
        //Variables needed to store the required information during the parsing of the XML document.
        private final Deque<String> path = new ArrayDeque<>();
        private final StringBuilder currentValue = new StringBuilder();
        private ContextNode currentNode = null;
        private ContextNode rootNode = null;
        private Map<String, String> currentAttributes;
        private final HashMap<String, String> contextHeader = new HashMap<>();
    
        @Override
        public void startElement(final String uri, final String localName, final String qName, final Attributes attributes) {
            //Put every XML tag within the stack at the beginning of the XML tag.
            path.push(qName);
    
            //Reset attributes for every element
            currentAttributes = new HashMap<>();
    
            //Get the path from Deque as / separated values.
            final String p = path();
    
            //If the XML tag contains the Namespaces or attributes then add to respective Namespaces Map or Attributes Map.
            if (attributes.getLength() > 0) {
                //Loop over every attribute and add them to respective Map.
                for (int att = 0; att < attributes.getLength(); att++) {
                    //If the attributes contain the : then consider them as namespaces.
                    if (attributes.getQName(att).contains(":") && attributes.getQName(att).startsWith("xmlns:")) {
                        contextHeader.put(attributes.getQName(att).substring(attributes.getQName(att).indexOf(":") + 1), attributes.getValue(att));
                    } else {
                        currentAttributes.put(attributes.getQName(att), attributes.getValue(att).trim());
                    }
                }
            }
    
            if (rootNode == null) {
                rootNode = new ContextNode(contextHeader);
                currentNode = rootNode;
                rootNode.children.add(new ContextNode(rootNode, "type", qName));
            } else if (currentNode != null) {
                ContextNode n = new ContextNode(currentNode, qName, (String) null);
                currentNode.children.add(n);
                currentNode = n;
            }
        }
    
        @Override
        public void characters(char[] ch, int start, int length) {
            currentValue.append(ch, start, length);
        }
    
        @Override
        public void endElement(final String uri, final String localName, final String qName) {
            try {
                System.out.println("completed reading");
                System.out.println(rootNode);
            } catch (NoSuchAlgorithmException e) {
                e.printStackTrace();
            }
    
    
            rootNode = null;
            
    
            //At the end of the XML element tag reset the value for next element.
            currentValue.setLength(0);
    
            //After completing the particular element reading, remove that element from the stack.
            path.pop();
        }
    
        private String path() {
            return String.join("/", this.path);
        }
    }
    
    
    

    You may need to make some additional changes based on your particular requirement. This is just a sample that gives some idea.