We have a Flink pipeline aggregating data per "client" by combining data with identical keys ("client-id") and within the same window.
The problem is trivially parallelizable and the input Kafka topic has a few partitions (same number as the Flink parallelism) - each holding a subset of the clients. I.e., a single client is always in a specific Kafka partition.
Does Flink take advantage of this automatically or will reshuffle the keys? And if the latter is true - can we somehow avoid the reshuffle and keep the data local to each operator as assigned by the input partitioning?
Note: we are actually using Apache Beam with the Flink backend but I tried to simplify the question as much as possible. Beam is using FixedWindows
followed by Combine.perKey
I'm not familiar with the internals of the Flink runner for Beam, but assuming it is using a Flink keyBy
, then this will involve a network shuffle. This could be avoided, but only rather painfully by reimplementing the job to use low-level Flink primitives, rather than keyed windows and keyed state.
Flink does offer reinterpretAsKeyedStream
, which can be used to avoid unnecessary shuffles, but this can only applied in situations where the existing partitioning exactly matches what keyBy would do -- and I see no reason to think that would apply here.