I'm looking for a way to parse a large xml using kotlin.
My usual JSON parser is Jackson, and I know it can also be used to parse xml.
The source file is too large to be parsed using a DOM approach, and I must instead using the streaming API. I can find several example on how to use jackson streaming API with JSON, but nothing about XML. Documentation https://github.com/FasterXML/jackson-dataformat-xml says
Although module implements low-level (JsonFactory / JsonParser / JsonGenerator) abstractions, most usage is through data-binding level. This because a small number of work-arounds have been added at data-binding level, to work around XML peculiarities:
and this made my worried if a streaming approach of XML with this lib is even possibile and/or supported.
Read a tree structure require process STARTING (e.g. <element>
for XML or { / [
for JSON) that's way is not possible read the entire object while it is processed in a streaming way.
Let the root wrapper and a big list of cars (for brevity I use lombok annotations):
@Getter
@Setter
@JacksonXmlRootElement
@NoArgsConstructor
@AllArgsConstructor
static class CarBook {
@JacksonXmlProperty(isAttribute = true)
private int version;
@JacksonXmlElementWrapper(localName = "cars")
@JacksonXmlProperty(localName = "car")
private List<Car> cars;
}
@Getter
@Setter
@ToString
@NoArgsConstructor
@AllArgsConstructor
static class Car {
private String model;
private String plate;
}
then, you cannot get a CarBook
object until all list (an may be other members) are fully readed.
The usual way then, is use a XMLStreamReader
and check token by token what you get but you can use jackson to parse entire objects using the XmlMapper
method:
/**
* Method for reading a single XML value from given XML-specific input
* source; useful for incremental data-binding, combining traversal using
* basic Stax {@link XMLStreamReader} with data-binding by Jackson.
*
* @since 2.4
*/
public <T> T readValue(XMLStreamReader r, Class<T> valueType) throws IOException {
return readValue(r, _typeFactory.constructType(valueType));
}
as an example, let the big (1,2G) file:
<CarBook version="1"><cars>
<car><model>Alfa Romeo Spider</model><plate>27437</plate></car>
<car><model>Almera</model><plate>6429</plate></car>
<car><model>Audi 80 and 90</model><plate>4898</plate></car>
<car><model>Audi A3</model><plate>21259</plate></car>
<car><model>Audi A4</model><plate>21056</plate></car>
<car><model>Audi Coupé</model><plate>5623</plate></car>
<car><model>Austin Metro</model><plate>26446</plate></car>
<car><model>BMW 3 Series</model><plate>16338</plate></car>
<car><model>BMW 5 Series</model><plate>29859</plate></car>
...
the, you can read lazily with
public static void main(String... args) throws IOException, XMLStreamException {
XmlMapper xm = new XmlMapper();
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xr = xif.createXMLStreamReader(new FileInputStream(/* 1,2G file */ "/home/josejuan/tmp/all.cars.xml"));
// you must to read step by step
while (xr.hasNext()) {
xr.next();
if (xr.getEventType() == START_ELEMENT) {
System.out.println(xr.getLocalName());
if ("car".equals(xr.getLocalName())) {
Car car = xm.readValue(xr, Car.class);
System.out.println(car);
if ("21056".equals(car.getPlate()))
break;
}
}
}
System.out.println("== End Of Process ==");
}
with output
CarBook
cars
car
WithLazyJackson.Car(model=Alfa Romeo Spider, plate=27437)
car
WithLazyJackson.Car(model=Almera, plate=6429)
car
WithLazyJackson.Car(model=Audi 80 and 90, plate=4898)
car
WithLazyJackson.Car(model=Audi A3, plate=21259)
car
WithLazyJackson.Car(model=Audi A4, plate=21056)
== End Of Process ==
reading only 5 cars out of 19.800.000