I have a spring batch job. It processes a large number of items. For each item, it calls an external service (assume a stored procedure or a REST service. This does some business calculations and updates a database. These results are used to generate some analytical reports.). Each item is independent, so I am partitioning the external calls in 10 partitions in the same JVM. For example, if there are 50 items to process, each partition will have 50/10 = 5 items to process.
This external service can result a SUCCESS
or FAILURE
return code. All the business logic is encapsulated in this external service and therefore worker step is a tasklet which just calls the external service and receives a SUCCESS
/FAILURE
flag. I want to store all the SUCCESS
/FAILURE
flag for each item and get them when job is over. These are the approaches I can think of:
SUCCESS
/FAILURE
in a collection and store that in job execution context. Spring batch persists the execution context and I can retrieve it at the end of the job. This is the most naïve way, and causes thread contention when all 10 worker steps try to access and modify the same collection.CopyOnWriteArrayList
. But this is too costly and the whole purpose of partitioning is defeated when each worker step is waiting to access the list.Are there any better ways to do this?
You still did not answer the question about which item writer you are going to use, so I will try to answer your question and show you why this detail is key to choose the right solution to your problem.
Here is your requirement:
I have a spring batch job. It processes a large number of items.
For each item, it calls an external service (assume a stored procedure
or a REST service. This does some business calculations and updates a database.
In your description, you are talking about storing item IDs with their status in the job execution context. While this is possible, what I'm saying is that if you are going to write items to a table anyway in which you have a column with a status flag, you don't need to use the job execution context at all. Hence my question:
are you going to write the items themselves to a persistent store?
The item writer is required in a chunk-oriented step and the solution
depends on how you are going to write items (also, is the success/failure
status just a flag? or a different object with more information?, etc).
Where those items are going to be written? A table, a file, to the standard
output with System.out ?
So I will assume you going to write items to table having a status column since you said This does some business calculations and updates a database
.
You can use an item processor to do the business logic and flag item with their status (ie your domain object has a flag status
that the processor sets as needed). The item writer then updates items in the database with their status. This approach solves all issues listed above by design as it does not require the job execution context and is a good option for a multi-threaded or partitioned step (since items are independent).