How can i build Springbatch architecture in order to bypass oracle IN limitation?

I'm trying to build new springbatch project and i just want to execute this steps

-> read list of accounts Ids (more than 10 000)
--> for each accounts Ids
---> read purchase into database
---> write datas into elasticsearch

  @Bean
  public Job importPurchase(
      JobBuilderFactory jobBuilderFactory, Step findAccountStep, Step importPurchaseStep) {
    return jobBuilderFactory
        .get("importJob")
        .incrementer(new RunIdIncrementer())
        .start(findAccountStep)
        .next(importPurchaseStep)
        .build();
  }

I have create a springbatch job with 2 steps and limit my first chunk with 1000 items (oracle IN query is limited to 1000). But my first step is executed 10 times and then, the second step start. So my list of accountsIds contains 10 000 entries.

How can i read my accountsId and then get the purchase by 1000 items ?

Thanks ! :)

Solution

You can use the driving query pattern with a single chunk-oriented step in which:

The reader reads account IDs
The processor enriches items with purchase details
The writer writes enriched items to ES

It should be noted that this pattern works well for small/medium data sets, but not so well for large data sets at it requires an additional query for each item. In your case, it should perform well as the input data set size is still reasonable.

EDIT: How to scale this pattern?

It is possible to use the same pattern on a partitioned input. For example, partitioning the 10.000 account IDs into ranges and create a partition for each range. Spring Batch provides a sample for this in the ColumnRangePartitioner.

Once the partitions are created, you can create a partitioned step where each worker applies the driving query pattern on the partition assigned to it. Workers can be local threads or remote JVMs, which allows you to horizontally scale your processing.