spring spring-batch spring-cloud spring-cloud-dataflow spring-cloud-task

Spring Cloud Task - Remote Partitioning Concerns

We have Spring Cloud Data Flow local setup and the task is running the Spring Batch Job which reads from a Database and writes to AWS S3, all of this works fine.

When it comes to stopping the JOB, the task stops but resuming the job is not possible since the status is in "STARTED", this I think we can handle in code, by setting the batch status to 'STOPPED' when the stop is triggered, correct me if this can't be handled?
Also when trying to stop an individual slave task, there's an error:

2020-03-27 10:48:48.140 INFO 11258 --- [nio-9393-exec-7] .s.c.d.s.s.i.DefaultTaskExecutionService : Task execution stop request for id 192 for platform default has been submitted 2020-03-27 10:48:48.144 ERROR 11258 --- [nio-9393-exec-7] o.s.c.d.s.c.RestControllerAdvice : Caught exception while handling a request

java.lang.NullPointerException: null at org.springframework.cloud.dataflow.server.service.impl.DefaultTaskExecutionService.cancelTaskExecution(DefaultTaskExecutionService.java:669) ~[spring-cloud-dataflow-server-core-2.3.0.RELEASE.jar!/:2.3.0.RELEASE] at org.springframework.cloud.dataflow.server.service.impl.DefaultTaskExecutionService.lambda$stopTaskExecution$0(DefaultTaskExecutionService.java:583) ~[spring-cloud-dataflow-server-core-2.3.0.RELEASE.jar!/:2.3.0.RELEASE]
How do we implement this is in distributed environment where we have a master server which can start the master on the master server and start the workers on respective slave servers?

Solution

1) You are correct you will need to change your status from STARTED to FAILED.

2) Since remote partitioning uses Spring Cloud Deployer (not Spring Cloud Data Flow) to launch the worker tasks, SCDF does not have a way to determine platform information to properly stop the the worker task. I've added GH Issue spring-cloud/spring-cloud-dataflow#3857 to resolve this problem.

3) The current implementation prevents a user from launching on multiple servers, rather lets the platform (Kubernetes, Cloud Foundry) distribute the worker tasks. You can implement your own deployer to add this feature.