Search code examples
javaspring-batch

In spring batch, how to insert a piece of code just after reading a list of item by bulk, with given list of item as parameter?


I'm using spring batch in chunk mode for processing items. I read them by bulk(6000 items by bulk), process them one by one, and write them all. I read them via a JdbcCursorItemReader, which is very conveniant for bulk processing, and reading. The problem is that once read, I need to retrieve additional data from another source. Simplest way is to do it in the processor, calling custom method like getAdditionalDataById(String id). The wrong thing in this is that it consume a lot of times. So I would like to retrieve those additionnal data by bulk too : just after reading 6000 items, get their ids, and call something like getAllAdditionalDataByIdIn(List<String> ids). But I don't know where I can insert my piece of code, as the @AfterRead annotation is after each item and not after bulk read. Same goes for @BeforeProcess. The only solution I can get by now is doing nothing in the processor, and get the additionnal information in the writer, processing items in the writer, and writing them in the writer (It's a custom writer).

Any help will be appreciated.

I'm using spring batch 4.0.1, reading from a sqlserver, and writing to an elasticsearch. The additionnal datas are stored in an elasticsearch too. I've searched a bit in the code, a lot on the documentation, but can see any annotation, or anything else that can help me.


Solution

  • The problem is that once read, I need to retrieve additional data from another source. Simplest way is to do it in the processor, calling custom method like getAdditionalDataById(String id). The wrong thing in this is that it consume a lot of times.

    This is known as the driving query pattern where an item processor is used to enrich items with additional data (from another datasource for instance). This pattern can indeed introduce some performance issues as it requires an additional query for each item.

    So I would like to retrieve those additionnal data by bulk too : just after reading 6000 items, get their ids, and call something like getAllAdditionalDataByIdIn(List ids).

    The closest you can have is ItemWriteListener#beforeWrite where you get access to the list of items before writing them. With the list of items in scope, you can have their IDs and call your getAllAdditionalDataByIdIn(List<String> ids) method.