Search code examples
apache-flinkflink-streaming

Flink ProcessWindowFunction behavior on keyed stream


I have a keyed stream with a tumbling window whose size is 60 seconds.

For example, there are 10 keys in total named from key_1 to key_10. Consider a situation: Within a certain window 12:00:00 to 12:01:00, I have data fallen in this window with key being key_1 to key_9; There is no key_10 data in this min.

When the window gets fired, ProcessWindowFunction gets called and starts its work. I wonder how the window function works, and I have two guess:

  1. The function will calculate for each key no matter there is data or not (in this case, key_1 to key_10)
  2. The function will only be called and process those keyed streams with data (in this case, key_1 to key_9)

Solution

  • Windows are created lazily when events are assigned to them. So there will be no window for key_10 for that interval during which key_10 has no events.

    There being no window for key_10, the ProcessWindowFunction will not be called, and you won't get any results for key_10 for that particular timeframe.

    This is covered in the documentation in the section on surprises with the window API under no results for empty TimeWindows.