Question:
I'm experiencing an issue with the RepositoryItemReader
in my Spring Batch job where it is skipping items during the read process. I have configured the RepositoryItemReader
with a specific paging size and aligned it with the chunk size of the step. However, some items are being skipped, and I'm unable to determine the root cause of the problem.
I read the answer here before i write the question , but I can't find the solution for my case .
Here are the relevant details of my setup:
I'm using Spring Batch v5.
The RepositoryItemReader
is reading data from a repository using the findByIsEnabled
method.
I have set the pageSize
of the RepositoryItemReader to the same value as the chunk size in the step definition.
I have verified that there are no filters or conditions applied to the RepositoryItemReader that could exclude items.
The repository used by the RepositoryItemReader is transactional, and the methods called by the reader are executed within the same transaction. I have checked the data source for any inconsistencies or missing items that could cause skipping, but everything seems to be in order.
nearly half of pages are skipped (4961/1000) items ratio
Update: I have created a simple demo application to reproduce the issue. You can find the code on my GitHub repository: GitHub Repository Link
The example at the top works with the h2 embedded database, you can just run it and you can see the result.
I suspect there might be an issue with the pagination logic or some misconfiguration that causes the reader to skip the next page. I have reviewed the Spring Batch documentation and tried various configurations, but I couldn't identify the root cause of this behavior.
Could someone please help me identify potential causes for this skipping behavior with the RepositoryItemReader? Any suggestions or insights would be greatly appreciated.
At first glance the problem in your code is that in the ItemProcessor
you are modifying the column enabled
which is also used as a filter in the reader. This can't work with a paging reader. Every chunk will get a different set or rows due to the change made in the processor.
Solution
Don't change the column enabled
in the processor. Use a different column like processed
and after the batch is completed run an SQL to update the enabled
column.