How does the hash data exchange works with Flink Tables?

I have two Table Sources listen to a Kafka topic using Flink. On the generated graph, the data exchange for the join between these two tables is "hash". However, I did not found any information how the hash works (on a specific field ?) and how it can be configured ?

Solution

In the case of a Table join, the hash data exchange is based on the equality clause of the join. E.g., if you are doing

SELECT *
FROM A, B
WHERE A.id = B.id

then the stream records from both streams will be hashed on the id field. This will guarantee that all records from both A and B with the same id value will be sent to the same downstream instance. (This is why Flink only supports joins with an equality predicate -- they are much more readily scalable.)

Internally this turns into a keyBy from Flink's DataStream API. A Table/SQL GROUP BY works the same way.