Search code examples
hibernatespring-batchchunking

Spring batch chunk processing , how does the reader work ?if the result set changes?


I'm new to springBatch chunking. I want to understand how reader works

here is the scenario : implementing a purging of user accounts Chunk processor : have a reader which reads all the user accounts that matches with purge criteria ,in an order. processor : for each user account based on the some calculation ,it may create a new user account and also changes current record(say mark it as purged)

question : how doe the reader work? say i have 5000 user accounts. If my chunk size is 1000

will reader reads 1000 records and then starts processor . (say processor creates another 100 new records ) ,now writer writes whatever records updated

for reading next 1000 records will the reader executes query again? how does it know where to start?

I'm using hibernate.


Solution

  • To answer your specific question, it depends on the ItemReader implementation you use. If you're using the JdbcCursorItemReader, we hold the cursor open during the entire process so we're really reading from the execution of one query. If you're using the JdbcPagingItemReader, then where the next chunk begins is based on the pagination logic.

    A couple notes:

    1. Using Hibernate can be tricky with batch processing. There are added complexities when using Hibernate that you can avoid when going straight to the database (not to mention potential performance benefits in a batch environment).
    2. Keep in mind that Spring Batch provides no checks for if the underlying dataset has changed. If you're using the JdbcPagingItemReader, each query is a unique query so if you add records that meet the criteria, they will be returned as well (I'm not 100% sure what would happen if the underlying data changed while a cursor was open…it may be a function of the db itself). Typically, you'll tag the records you want to process in that batch run with some from of flag (timestamp, processing flag, etc).