Search code examples
apache-flinkflink-streaming

What do terms like Hash, Forward mean in the Flink plan?


This is an image of the Flink plan that appears on the dashboard when I deploy my job. As you can see, the connections between operators are marked as FORWARD/HASH etc. What do they refer to? When is something called a HASH and when is something called a FORWARD?

enter image description here


Solution

  • First of all, as we know, a Flink streaming job will be splitted into several tasks according to its job graph(or DAG). The FORWARD/HASH is a partitioner between the upstream tasks and downstream tasks, which is used to partition data from the input.

    What is Forward? And When does Forward occur?

    This means the partitioner will forwards elements only to the locally running downstream tasks. Forward is the default partitioner if you don't specify any partitioner directly or use the functions with partitioner like reblance/keyBy.

    What is Hash? And When does Hash occur?

    This is a partitioner that partition the records based on the key group index. It occurs when you call keyBy.