Search code examples
parallel-processingbigdataapache-flinkdistributed-computing

Flink: how does the parallelism set in the Jobmanager UI relate to task slots?


Let's say I have 8 task managers with 16 task slots. If I submit a job using the Jobmanager UI and set the parallelism to 8, do I only utilise 8 task slots?

What if I have 8 task managers with 8 slots, and submit the same job with a parallelism of 8? Is it exactly the same thing? Or is there a difference in the way the data is processed?

Thank you.


Solution

  • The total number of task slots in a Flink cluster defines the maximum parallelism, but the number of slots used may exceed the actual parallelism. Consider, for example, this job:

    Flink job

    If run with parallelism of two in a cluster with 2 task managers, each offering 3 slots, the scheduler will use 5 task slots, like this:

    Parallelism of two with two task managers with three slots each

    However, if the base parallelism is increased to six, then the scheduler will do this (note that the sink remains at a parallelism of one in this example):

    Increase parallelism to six

    See Flink's Distributed Runtime Environment for more information.