Flink 1.14, Java, Table API + DataStream API (toDataStream
/toAppendStream
).
I'm trying to: read events from Kafka
, hourly aggregate (sum
, count
, etc.) and upsert results to Cassandra
as soon as new events are coming, in other words — create new records or re-calculate already existing records on every new event and sink results to Cassandra immediately. And aim is to see continuously updating sum
, count
values of primary keyed records. For this purposes I'm using SQL:
...
TUMBLE(TABLE mytable, DESCRIPTOR(action_datetime), INTERVAL '1' HOURS)
...
But task sending results to Cassandra after window interval is expired (every 1 hour). I know, its works as described in docs:
Unlike other aggregations on continuous tables, window aggregation do not emit intermediate results but only a final result, the total aggregation at the end of the window.
Question: How I can achieve that behavior (emit to sink intermediate results as soon as new event comes in)? Don't wait 1 hour for window closing.
Here are some options. Perhaps none of them is precisely what you want:
(1) Use CUMULATE instead of TUMBLE. This won't give you updated results with every new event, but you can have the result updated frequently -- e.g., once a minute.
(2) Use OVER aggregation. This will give you a continuously updated aggregation over the previous 60 minutes (aligned to each event, rather than to the epoch).
(3) Use DataStream windows with a custom Trigger that fires with each event. This will provide the behavior you've asked for, but will require a rewrite using the DataStream API.