Search code examples
apache-flinkflink-streaming

Flink non-keyed window with parallelism greater than 1


I am consuming a Kafka topic with more than 50 partitions using FlinkKafkaConsumer(...). I would like to create windows for these partitions. However, I don't expect any shuffling, so I can't use DataStream.keyBy(...). If I call DataStream.windowAll(...), the parallelism will be 1, which also not what I expect.

So is there any ways I can keep the high value of parallelism and no data shuffling at the same time?

Thanks


Solution

  • Without using keyBy, your options become rather limited. You could implement some sort of parallel windowing with a (non-keyed) ProcessFunction, but you won't have access to timers or keyed state, just operator state.