Search code examples
javaxmlxpathstax

Reading Huge XML File using StAX and XPath


The input file contains thousands of transactions in XML format which is around 10GB of size. The requirement is to pick each transaction XML based on the user input and send it to processing system.

The sample content of the file

<transactions>
    <txn id="1">
      <name> product 1</name>
      <price>29.99</price>
    </txn>

    <txn id="2">
      <name> product 2</name>
      <price>59.59</price>
    </txn>
</transactions>

The (technical)user is expected to give the input tag name like <txn>.

We would like to provide this solution to be more generic. The file content might be different and users can give a XPath expression like "//transactions/txn" to pick individual transactions.

There are few technical things we have to consider here

  • The file can be in a shared location or FTP
  • Since the file size is huge, we can't load the entire file in JVM

Can we use StAX parser for this scenario? It has to take XPath expression as a input and pick/select transaction XML.

Looking for suggestions. Thanks in advance.


Solution

  • Stax and xpath are very different things. Stax allows you to parse a streaming XML document in a forward direction only. Xpath allows parsing in both directions. Stax is a very fast streaming XML parser, but, if you want xpath, java has a separate library for that.

    Take a look at this question for a very similar discussion: Is there any XPath processor for SAX model?