Search code examples
spring-bootspring-batchspring-cloud-dataflowspring-batch-admin

Is it sutable to use Spring Cloud DataFlow to orchestrate long running external batch jobs inside infinite running apps?


We have String Batch applications with triggers defined in each app.

Each Batch application runs tens of similar jobs with different parameters and is able to do that with 1400 MiB per app.

We use Spring Batch Admin, which is deprecated years ago, to launch individual job and to get brief overview what is going in jobs. Migration guide recommends to replace Spring Batch Admin with Spring Cloud DataFlow.

Spring Cloud DataFlow docs says about grabbing jar from Maven repo and running it with some parameters. I don't like idea to wait 20 sec for application downloading, 2 min to application launching and all that security/certificates/firewall issues (how can I download proprietary jar across intranets?).

I'd like to register existing applications in Spring Cloud DataFlow via IP/port and pass job definitions to Spring Batch applications and monitor executions (including ability to stop job). Is Spring Cloud DataFlow usable for that?


Solution

  • Few things to unpack here. Here's an attempt at it.

    Spring Cloud DataFlow docs says about grabbing jar from Maven repo and running it with some parameters. I don't like idea to wait 20 sec for application downloading, 2 min to application launching and all that security/certificates/firewall issues

    Yes, there's an App resolution process. However, once downloaded, we would reuse the App from Maven cache.

    As for the 2mins bootstrapping window, it is up to Boot and the number of configuration objects, and of course, your business logic. Maybe all that in your case is 2mins.

    how can I download proprietary jar across intranets?

    There's an option to resolve artifacts from a Maven artifactory hosted behind the firewall through proxies - we have users on this model for proprietary JARs.

    Each Batch application runs tens of similar jobs with different parameters and is able to do that with 1400 MiB per app.

    You may want to consider the Composed Task feature. It not only provides the ability to launch child Tasks as Direct Acyclic Graphs, but it also allows transitions based on exit-codes at each node, to further split and branch to launch more Tasks. All this, of course, is automatically recorded at each execution level for further tracking and monitoring from the SCDF Dashboard.

    I'd like to register existing applications in Spring Cloud DataFlow via IP/port and pass job definitions to Spring Batch applications and monitor executions (including ability to stop job).

    As far as the batch-jobs are wrapped into Spring Cloud Task Apps, yes, you'd be able to register them in SCDF and use it in the DSL or drag & drop them into the visual canvas, to create coherent data pipelines. We have a few "batch-job as task" samples here and here.