Search code examples
apache-flinkflink-streaming

Flink Timer onTimer events - are they forcing redistribution of the stream?


I have KeyedProcessFunction (named Stateless) that processes data per key (e.g. transaction id). Stateless is lightweight and doesn't use state - it sets only a timer.

Upstream I'm reading from Kafka (partitioned, key is hash + modulo of transaction id). Downstream is not important. I've read Keyed State documentation and I understand that for stateful processing data will be redistributed (shuffled) and probably checkpointed to guarantee consistency but it's unknown how it works for timers.

When I don't use state but I register timer (It's per key context I assume):

  1. Are events redistributed and checkpointed across tasks based on keys similarly to stateful processing?
  2. Are timers expiry events are redistributed across tasks or are executed on machine which registered it?

I've read documentation and samples of the code.


Solution

  • In pretty much every respect timers behave in the same way as state. Timers are keyed, and are only available on keyed streams. They are scoped to the specific operator in which they are created -- typically a KeyedProcessFunction. Furthermore, timers are stateful in the sense that they are included in checkpoints.

    I wouldn't say that timer force a redistribution of the stream, but rather that they require it.