Search code examples
apache-flinkflink-streaming

Equally distribute operators with single parallelism in a multi-parallel Flink application


We have a flink application that has a map operator at the start. The output stream of this operation is routed to multiple window functions using filters. The window functions all have a parallelism of 1. We form a union of the output of the window functions and pass it to another map function and then send it to a sink.

We need the parallelism of both the map functions to take the parallelism of the environment. This happens as expected and the parallelism of the window function does turn out to be 1. We have set 1 slot per task manager.

The issue is that all the window function tasks end up going to only the 1st task manager when we set the parallelism of the environment to greater than 1. The events end up going to this task manager alone and end up causing a bottleneck. Is there a way to distribute the window function task across multiple task managers when we have parallelism > 1? Will doing a rebalance() help?


Solution

  • If each task manager has only one slot, and all of the window function tasks are in the same task manager, then apparently all of the window function tasks are in the same slot.

    That being the case, you could use slot sharing groups to force different windows into different slots, and thus onto different task managers.