Search code examples
javaspringspring-bootspring-batchbatch-processing

SpringBatch- How to process files itself as a Item?


I am new to spring batch development. I have the following requirement. There will be a s3 source with zip files and each of the zipfiles will contain multiple pdf files and xml files.[Eg:100 pdfs and 100 xml files] (xml files will contain data about the pdf) Batch needs to read the pdf files and its associated xml file and push these to rest service/db.

When I looked at examples, most of it all covered how to read a line from the file and process it. here I have the items itself as file. I want to read one pdf file(as bytes) + xml file(converted into pojo) as set and push this to rest service one by one.

Right now, I am doing all the reading and processing inside a single tasklet. but I am sure there will be better solution to implement it. Please suggest and thank you.


Solution

  • The chunk-oriented processing model requires you to first define what an item is. In your case, one option is to consider an item as the PDF file (data) with its associated XML file (metadata). You can create a class that represents such an item and a custom item reader for it. Once that in place, you can use the reader in a chunk oriented step and a processor or writer that sends data to your REST endpoint.