Search code examples
javastax

Java8 - Convert a very large XML input into JSON (with an extra attribute)


I would like to transform a huge XML into JSON. Each time a specific XML-tag is recognized, I would like to convert the XML-part of the tag to JSON AND I would like to add a simple counter to it.

Important is that the input XML is very large, so putting it into a memomry JSON tree is not possible.

SO ... <xml><car>...</car><car>...</car>...

is converted to

{"number":2,"car":{"name":"car1"}}
{"number":3,"car":{"name":"car2"}}

Solution

  • Thanks to Andreas I finally found the solution for processing a huge XML file and converting the matches xml-elements to JSON.

    String testCars = "<root><car><name>car1</name></car><other><something>Unknown</something></other><car><name>car2</name></car></root>";
    String startElement = "car";
    int volgnummer = 1;
    XMLInputFactory factory = XMLInputFactory.newInstance();
    try {
        XMLStreamReader streamReader = factory.createXMLStreamReader(new StringReader(testCars));
        streamReader.nextTag();
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer t = tf.newTransformer();
        t.setOutputProperty("omit-xml-declaration", "yes");
        streamReader.nextTag();
        while ( streamReader.isStartElement() ||
              ( ! streamReader.hasNext() && 
                streamReader.nextTag() == XMLStreamConstants.START_ELEMENT)) {
            StringWriter writer = new StringWriter();
            StreamResult result = new StreamResult(writer);
            t.transform(new StAXSource(streamReader), result);
            JSONObject jsonObject = XML.toJSONObject(writer.toString());
            jsonObject.put("sequence", ++volgnummer);
            System.out.println("XmlChunkToJson: " + jsonObject.toString());
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    

    XML input:

    <?xml version="1.0"?>
    <root>
      <car>
        <name>car1</name>
      </car>
      <other>
        <something>Unknown</something>
      </other>
      <car>
        <name>car2</name>
      </car>
    </root>
    

    Output JSON:

    XmlChunkToJson: {"sequence":2,"car":{"name":"car1"}}
    XmlChunkToJson: {"sequence":3,"other":{"something":"Unknown"}}
    XmlChunkToJson: {"sequence":4,"car":{"name":"car2"}}