Search code examples
javaspring-bootjaxbapache-camelstax

Splitting a large XML file with Apache Camel using split, stax, jaxb


I have a large XML file (could be around million records) on a sftp server. I don't want to load the entire file in-memory. The intention is that my route picks up the file, splits it and use the stax builder to iterate over the elements and map it to a JAXB object and send it to a queue (or spring batch) in order to persist later to the database.

input.xml (only 2 records as example)

<data>
    <PRODUCTNUMBER>
        <PRODUCTNUMBER>8D0201075E</PRODUCTNUMBER>
        <CURRGROSSPRICE>427.90</CURRGROSSPRICE>
        <NEXTGROSSPRICE>0.00</NEXTGROSSPRICE>
        <NEXTPRICEDATE>1900-01-01 00:00:00</NEXTPRICEDATE>
        <PRODUCTNAME_FR>Some description</PRODUCTNAME_FR>
        <PRODUCTNAME_NL>Some description</PRODUCTNAME_NL>
    </PRODUCTNUMBER>

    <PRODUCTNUMBER>
        <PRODUCTNUMBER>99630211802</PRODUCTNUMBER>
        <CURRGROSSPRICE>3.78</CURRGROSSPRICE>
        <NEXTGROSSPRICE>0.00</NEXTGROSSPRICE>
        <NEXTPRICEDATE>1900-01-01 00:00:00</NEXTPRICEDATE>
        <PRODUCTNAME_FR>Some description</PRODUCTNAME_FR>
    </PRODUCTNUMBER>
</data>

Camel route

from("sftp:localhost:22/in")
   .split(stax(PartRecords.class)).streaming()
   .marshal().json(JsonLibrary.Jackson, true)
   .to("rabbitmq://rabbitmq:5672/myExchange?queue=partQueue&routingKey=queue.part")
   .end();

PartRecord.java

@XmlRootElement(name = "PRODUCTNUMBER")
@XmlAccessorType(XmlAccessType.FIELD)
@Getter
@Setter
@ToString
public class PartRecord implements Serializable {

    @XmlElement(name = "PRODUCTNUMBER")
    private String productNumber;

    @XmlElement(name = "CURRGROSSPRICE")
    private BigDecimal currentPrice;

    @XmlElement(name = "PRODUCTNAME_NL")
    private String partDescriptionNL;

    @XmlElement(name = "PRODUCTNAME_FR")
    private String partDescriptionFR;

}

PartRecords.java

@XmlRootElement(name = "data")
@XmlAccessorType(XmlAccessType.FIELD)
@ToString
public class PartRecords implements Serializable {

    @XmlElement(name = "PRODUCTNUMBER")
    private List<PartRecord> partRecords;

    public List<PartRecord> getPartRecords() {
        if (partRecords == null) {
            partRecords = new ArrayList<>();
        }
        return partRecords;
    }

}

The route is working fine and a message is put on the queue but instead of having 1 message per record, the entire file in json is put on the queue. I guess this is normal behavior so I need something extra. I don't know if it's good idea to have 1 message per record but I imagine having 1 message containing the entire file is also not performant.

Current behavior output

{
  "partRecords" : [ {
    "productNumber" : "8D0201075E",
    "currentPrice" : 427.90,
    "partDescriptionNL" : "Some description",
    "partDescriptionFR" : "Some description"
  }, {
    "productNumber" : "99630211802",
    "currentPrice" : 3.78,
    "partDescriptionNL" : null,
    "partDescriptionFR" : "Some description"
  }]
}

What am I doing wrong? I'm using Spring Boot v2.5.4, Apache Camel v3.11.1. Thanks in advance.


Solution

  • Use PartRecord instead PartRecords in your router:

     from("sftp:localhost:22/in")
       .split(stax(PartRecord.class)).streaming()
       .marshal().json(JsonLibrary.Jackson, true)
       .to("rabbitmq://rabbitmq:5672/myExchange?queue=partQueue&routingKey=queue.part")
       .end();