I'm using SpringBoot 2.4.x app with SpringBatch 4.3.x. I've created a simple job. Where I've FlatFileItemReader which reads from CSV file. I've ImportKafkaItemWriter which writes to Kafka topic. One step where I combines these. I'm using SimpleJobLauncher and I've set ThreadPoolTaskExecutor as TasKExecutor of the JobLauncher. It is working fine as I've expected. But one resilience use case I've which is if I kill the app and then restart the app and trigger the job then it would carry on and finish the remaining job. Unfortunately it is not happening. I did further investigate and found that when I forcibly close the app SpringBatch job repository key tables look like this:
job_execution_id | version | job_instance_id | create_time | start_time | end_time | status | exit_code | exit_message | last_updated | job_configuration_location |
---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 2021-06-16 09:32:43 | 2021-06-16 09:32:43 | STARTED | UNKNOWN | 2021-06-16 09:32:43 |
and
step_execution_id | version | step_name | job_execution_id | start_time | end_time | status | commit_count | read_count | filter_count | write_count | read_skip_count | write_skip_count | process_skip_count | rollback_count | exit_code | exit_message | last_updated |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 4 | productImportStep | 1 | 2021-06-16 09:32:43 | STARTED | 3 | 6 | 0 | 6 | 0 | 0 | 0 | 0 | EXECUTING | 2021-06-16 09:32:50 |
If I manually update these tables where I set a valid end_time and status to FAILED then I can restart the job and works absolutely fine. May I know what I need to do so that Spring Batch can update those relevant repositories appropriately and I can avoid this manual steps. I can provide more information about code if needed.
If I manually update these tables where I set a valid end_time and status to FAILED then I can restart the job and works absolutely fine. May I know what I need to do so that Spring Batch can update those relevant repositories appropriately and I can avoid this manual steps
When a job is killed abruptly, Spring Batch won't have a chance to update its status in the Job repository, so the status is stuck at STARTED
. Now when the job is restarted, the only information that Spring Batch has is the status in the job repository. By just looking at the status in the database, Spring Batch cannot distinguish between a job that is effectively running and a job that has been killed abruptly (in both cases, the status is STARTED
).
The way to go in indeed manually updating the tables to either mark the status as FAILED
to be able to restart the job or ABANDONED
to abandon it. This is a business decision that you have to make and there is no way to automate it on the framework side. For more details, please refer to the reference documentation here: Aborting a Job.