Search code examples
javaxmlxsd

There is a Java Library for splitting a large XML file into smaller valid XML with a max KB size?


I need to find a java library that has the methods to build a service capable of taking a very large and complex XML (with XSD) and splitting it into smaller XMLs with a defined size (for example 200KB x file).

Is there any library that does this? Or maybe there is already some code on github that does this?

In my XML there is a Big List of element in a specific point of the file. My idea is to split the BIG XML in smaller XMLs that contains the same list but with the elements divided among them, so as to divide the workload.

This is a schematic image


Solution

  • There is no automatic way to split a big xml in several smaller xml.

    As an extreme simplification a single xml represent a single object with properties. Splitting it in different xmls means splitting a single object in multiple objects. This is not something that can be done automatically.

    Let show a simple example. Imagine to have this xml

    <note>
      <to>Tove</to>
      <from>Jani</from>
      <heading>Reminder</heading>
      <body>Don't forget me this weekend!</body>
    </note>
    

    How do you split it? Is the following a valid way to split it? (It is a business decision how to split and recombine it).

    <note>
      <to>Tove</to>
      <from>Jani</from>
    </note>
    
    <note>
      <heading>Reminder</heading>
      <body>Don't forget me this weekend!</body>
    </note>
    

    If the problem is not related to spliting a big xml to smaller xmls, but to split a single big file to smaller files you can split it as

    <note>
      <to>Tove</to>
      <from>Jani</from>
    

    and

      <heading>Reminder</heading>
      <body>Don't forget me this weekend!</body>
    </note>
    

    But if the problem is the size of the file to send it over the internet or to save space when saving it, consider also to compress it. Compressing an xml file results in a very smaller compressed result. Eventually you can split the compressed file.

    If the problem instead is to hold in memory the whole file simply don't do that. Use a SAX parser instead of a DOM parser so you can hold in memory just a little portion of the original xml. A Sax parser is:

    SAX (Simple API for XML) is an event-driven online algorithm for parsing XML documents, with an API developed by the XML-DEV mailing list.1 SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole—building the full abstract syntax tree of an XML document for convenience of the user—SAX parsers operate on each piece of the XML document sequentially, issuing parsing events while making a single pass[clarification needed] through the input stream.