Search code examples
javaxmldomstax

Reading a big XML file using stax and dom


I need to read several big (200Mb-500Mb) XML files, so I want to use StaX. My system has two modules - one to read the file ( with StaX ); another module ( 'parser' module ) suppose to get a single entry of that XML and parse it using DOM. My XML files don't have a certain structure - so I cannot use JaxB. How can I pass the 'parser' module a specific entry that I want it to parse? For example:

<Items>
   <Item>
        <name> .... </name>
        <price> ... </price>
   </Item>
   <Item>
        <name> .... </name>
        <price> ... </price>
   </Item>
</Items>

I want to use StaX to parse that file - but each 'item' entry will be passed to the 'parser' module.

Edit:
After a little more reading - I think I need a library that reads an XML file using stream - but parse each entry using DOM. Is there such a thing?


Solution

  • You could use a StAX (javax.xml.stream) parser and transform (javax.xml.transform) each section to a DOM node (org.w3c.dom):

    import java.io.*;
    import javax.xml.stream.*;
    import javax.xml.transform.*;
    import javax.xml.transform.stax.StAXSource;
    import javax.xml.transform.dom.DOMResult;
    import org.w3c.dom.*
    
    public class Demo {
    
        public static void main(String[] args) throws Exception  {
            XMLInputFactory xif = XMLInputFactory.newInstance();
            XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
            xsr.nextTag(); // Advance to statements element
    
            TransformerFactory tf = TransformerFactory.newInstance();
            Transformer t = tf.newTransformer();
            while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
                DOMResult result = new DOMResult();
                t.transform(new StAXSource(xsr), result);
                Node domNode = result.getNode();
            }
        }
    
    }
    

    Also see: