One can use windows in Flink in two different manners
SELECT key, MAX(value)
FROM table
GROUP BY key, TUMBLE(ts, INTERVAL '5' MINUTE)
and
SELECT key, MAX(value) OVER w
FROM table
WINDOW w AS (PARTITION BY key ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
I wonder if those two mechanisms can express the same thing or what's the main difference and possible use cases for both?
Both queries compute different results that correspond to the semantics of regular SQL. So the difference is not Flink specific but defined by the SQL Standard.
The first query
SELECT key, MAX(value)
FROM table
GROUP BY key, TUMBLE(ts, INTERVAL '5' MINUTE)
groups records per key
and 5 minute buckets. Every 5 minutes the query produces one row per key
value with the maximum value
. For each group, multiple rows are aggregated to a single row.
The second query
SELECT key, MAX(value) OVER w
FROM table
WINDOW w AS (PARTITION BY key ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
produces one row for every row of the input (table
). A result row has the maximum value
for the key
value that was observed so far (rows are ordered by ts
). Note that multiple rows are not aggregated to a single row; each input row results in one output row. Moreover, the range of the maximum aggregation can be more than 5 minutes. In fact it's the whole key
partition in this example.