Search code examples
apache-flinkflink-streaming

Implications of merge method in AggregateFunction


I am trying to understand the AggregateFunction in Flink which is described here. Totally it has four methods namely,

  1. createAccumulator
  2. add
  3. getResult
  4. merge

From my understanding,

createAccumulator method is invoked when the first element enters into a new window and newly created instance will be used further

add method is invoked to reduce the result based on definition and this uses the instance which is created in createAccumulator method

getResult method is invoked when a window is closed and returns the available result

Whether my understanding about the above methods are correct or not? Finally, what is the use-case of merge method and when it is used/invoked? The definition available here is not clear for me.


Solution

  • The merge method is called when two windows are merged. This applies to session windows, which are merged whenever two sessions are collapsed into one by the arrival of an event that bridges the gap between the sessions. When this occurs, the aggregated results-to-date of both sessions are combined by calling merge.