Search code examples
apache-flinkflink-streaming

Flink: Compute only at the end of a session window


I have an AggregateFunction which computes an average over a sequence of events in a WindowedStream.
The caveat here is that the average needs to be computed over event pairs which can arrive out of order (or not at all).

In other words, I need to sort the data before the computation because the sequence is important.

I can do this with getResult API but this function is called on every event in the window, which doesn't make sense performance wise. I could also do this with flink-cep, but I'd like to avoid it for the same reason.

Ideally, I'd like to only compute the average at the very end (where I can sort the data once), when the window is closed.

Is there some sort of handler for this? The closest thing I found was triggers, but there is no method for the closing of the window.

Thanks

Edit:
I ended up using ProcessWindowFunction with Incremental Aggregation

A ProcessWindowFunction can be combined with either a ReduceFunction, an AggregateFunction, or a FoldFunction to incrementally aggregate elements as they arrive in the window. When the window is closed, the ProcessWindowFunction will be provided with the aggregated result. This allows it to incrementally compute windows while having access to the additional window meta information of the ProcessWindowFunction.


Solution

  • Instead of an AggregateFunction, you can use a ProcessWindowFunction without incremental aggregation. This function will be called when the window is triggered, and will be passed an Iterable containing the window's contents, and a Collector you can use to emit results.

    When the ProcessWindowFunction is called you can sort the contents, and produce whatever output you want.