Search code examples
javaxmlapijdomjavolution

XML API for best performance


I have an application that works with a lot of XML data. So, I want to ask you which is the best API to handle XML in java. Today, I'm using W3 and, for performance, I want to migrate to some API. I make XML from 0, a lot of transforms, import into database (mysql, mssql, etc), export from database to html, modifi of those XML, and more.

Is JDOM the best option? do you know some other better than JDOM? I heard (by reading pages) about javolution. Somebody use it?

Which API you recommend me?


Solution

  • If you have vast amounts of data, the main thing is to avoid having to load it all into memory at once (because it will use a vast amount of memory, and because it prevents you overlapping IO and processing). Sadly, i believe most DOM and DOM-like libraries (like DOM4J) do just that, so they are not well suited for processing vast amounts of XML efficiently.

    Instead, look at using a streaming API, like SAX or StAX. StAX is, in my experience, usually easier to use.

    There are other APIs that try to give you the convenience of DOM with the performance of SAX. Javolution might be one; VTD-XML is another. But to be honest, i find StAX quite easy to work with - it's basically a fancy stream, so you just think in the same way as if you were reading a text file from a stream.

    One thing you might try is combining JAXB with StAX. The idea is that you stream the file using StAX, then use JAXB to unmarshal chunks within it. For instance, if you were processing an Atom feed, you could open it, read past the header, then work in a loop unmarshalling entry elements to objects one at a time. This only really works if your format consists of a sequence of independent elements, like Atom; it would be largely useless on something richer like XHTML. You can see examples of this in the JAXB reference implementation and a guy's blog post.