Search code examples
springspring-batchspring-cloud-dataflowspring-cloud-task

Spring Cloud Dataflow - Parallel Tasks


I have about 16 tasks configured in parallel like <AAA && BBB|| CCC && DDD || EEE && FFF || GGG && ......>.

My intention is to only have 3 tasks running at one time. I don't mind which tasks run first as long as the order of the sequential tasks are maintained (BBB is always run after AAA, DDD after CCC etc.)

As per the documentation here - https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#_configuration_options, I tried setting --split-thread-core-pool-size=3, but it gave me this error -

Split thread core pool size 3 should be equal or greater than the depth of split flows 17. Try setting the composed task property splitThreadCorePoolSize

What do I do here ?


Solution

  • Spring Cloud Dataflow's Composed Task Runner uses Spring Batch under the covers. And the way Spring Batch deals with nested splits in flows is not quite optimal:

    That's why nested splits should be avoided if tight control on concurrency limits is required.

    In your case that should be possible: With

    <AAA && BBB || CCC && DDD || EEE && FFF>
    

    and --split-thread-core-pool-size=2, it works as expected. But with

    <<AAA && BBB> || <CCC && DDD> || <EEE && FFF>>
    

    you should get the message that the thread core pool size must be at least 4.

    In your question, you stated the flow in the upper form. Please make sure that you really enter it in that form. If you enter it in the lower form, SCDF will nevertheless display the upper form in many (but not all) places. The graph property in the task execution view of a started composed task should show the full definition.