batch-processing websphere-liberty jsr352

Job not stopping when the stop request made through postman

I am using IBM liberty server and running jsr 352 batch job. Launching a batch job using postman as rest call. But when I tried to stop the job using the instanceid, its saying status as "STOPPING" and it's takes own time to stop. sometimes it's in status of "STOPPING" for few hours. How to force the job to stop.

Note: The job has partition step which reads from database and creates output file.

Using the instanceid of the job, I'm trying to stop the job using postamn put method like below

https://*****:9443/ibm/api/batch/jobinstances/405573?action=stop

//put method from postman
https://*****:9443/ibm/api/batch/jobinstances/405573?action=stop

Response return:

"jobName": "test-job",
    "executionId": 405574,
    "instanceId": 405573,
    "batchStatus": "STOPPING",
    "exitStatus": "",

I would expect batch job should be stopped, when I tried to get the batch status using the below URL at least after few minutes or hour. But it takes few hours in some cases.

https://******:9443/ibm/api/batch/jobinstances/405573

Solution

Basics on "stop", for Batchlet vs Chunk steps

For a chunk step, the container checks to see if a stop has been issued after each item is read and processed. The assumption is the chunk step will read, process, write multiple chunks of multiple items, so a stop check after each item is enough to stop relatively soon. With a batchlet step, on the other hand, where the application processing isn't broken up into anything known to the container, the batch container instead will invoke the user-implemented stop() on a separate thread, which the application can use to interrupt processing on the main process() thread.

Ideas on Chunk step taking a long time to stop

For a chunk step taking a long time to respond to a stop(), one explanation then could simply be that the application is similarly taking a long time to read and process a single item.

If it were important enough, this could perhaps be addressed by refactoring item read and processing into more fine-grained logic, so each item is processed more quickly.

Another approach, if the logic can't easily be broken down into smaller "items", would be to refactor this into a batchlet, where you can implement the stop() method yourself, and react appropriately within your application. After all, there's a good chance if you couldn't break down the chunk into smaller items then you weren't getting much value out of checkpoints anyway.