Search code examples
javaxmlstax

Streaming xml in java


I am trying to read large XML file, I want only to read cars owners and I can't load whole xml to memory, how to do that ?

The XML file:

  <root>
    <message>
        <car>
            <owner>adam</owner>
        </car>
        <desk>
            <owner>sam</owner>
            <game>
               <owner>dorothy</owner>
            </game>
            <pen>
               <owner>dorothy</owner>
            </pen>
        </desk>
    </message>
</root>

For example this code does not know exactly what it reads.. how to be sure that we are reading car owners ?

 XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
        XMLEventReader reader = xmlInputFactory.createXMLEventReader(new FileInputStream(entry.toFile()));

        while (reader.hasNext()) {
            XMLEvent nextEvent = reader.nextEvent();

            if (nextEvent.isStartElement()) {
                StartElement startElement = nextEvent.asStartElement();
                log.info(startElement.getName().toString());

                switch (startElement.getName().getLocalPart()) {
                    case "owner":
                        // whose owner. .. ?

Solution

  • Sturdy but viable solution is to create a small state machine, capture events as they go and mutate state accordingly

    1. If entering car node - store car reference
    2. If entering owner node AND you have entered car node previously, store owner of a car
    3. When exiting car node return car-owner pair
    4. Repeat and handle nesting and/or node level to accept only car>owner.