Search code examples
windowapache-flink

Flink eventTime keyed-window not trigger when some keys arrive too slow


flink+kafka, with two topic partitions and 2 Parallelism;

env.setParallelism(2)
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
...
.keyBy(0) //("k",1576726230000)
.timeWindow(Time.seconds(2L))
...

The proplem like blow:

"key1" is producted fast, 1 msg per second

"key2" is producted slow, 1 msg per 4 second

then the window will not trigger by 2 second, because key2's watermark is slow arrived

How to solve it? I have one idea: setParallelism(1), so window can be triggered per 2 second, but if I want to keep Parallelism(2) and keep window triggered by 2 seconds(or 2.5 second timeout to trigger all window), how to make it? Please make some advice, thank you!

Actually, the scene is that in the day message is more fast, and in the night message is too slow, but need to update per 2 second.


Solution

  • One way to solve this problem is to generate watermarks after mixing together events from all of the partitions, so that the slow/idle partition doesn't hold back the overall watermark:

    stream
      .rebalance()
      .assignTimestampsAndWatermarks(...)
      .keyBy(...)
      .timeWindow(...)
    

    This does come at the price of an extra network shuffle.