Search code examples
apache-flinkflink-streaming

How can I get the moving sum of streaming events?


I have a source that emits integer events.

For each new integer, I would like to sum it with all the integers that got streamed in the previous hour and emit that value to the next step.

What is the idiomatic way of calculating and then emitting the sum of the current event's integer combined with integers from all the events in the preceding hour? I can think of two options, but feel I am missing something:

  • Use a sliding window of size one hour that slides by one millisecond. This would ensure there is always a window that spans from the latest event back one hour exactly.
  • Create my own process function that keeps track of the previous integers that are less than or equal to one hour old. Use this state to do my calculations.

Solution

  • You can do that with Flink SQL using an over window. Something like this:

    SELECT
        SUM(*) OVER last_hour AS rolling_sum
    FROM Events
    WINDOW last_hour AS (
        ORDER BY eventTime
        RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW
    )
    

    See OVER Aggregation from the Flink SQL docs for more info. You could also use the Table API, see Over Windows.