Search code examples
apache-kafkaevent-handlingapache-flink

Apache Flink for daily aggregation report


We have flink running over kafka for various aggregations. One of the streams we analyse is order-audits (Basically every state change is emitted as an event).

Each order-event is something like this

{
  "id" : "ord-1",
  "merchant_id" : "merchant-a",
  "status" : "created",
  ...
  "updated_at" : 
  "event_time" : 
}

I want to run an aggregate which can aggregate at a merchant level for a given day.

Something like

{ 
   "merchant_id" : "merchant-a",
   "date" : "2019-07-01",
   "started" : 10,
   "completed" : 13,
   "cancelled" : 3
}

Is flink a good fit for this type of aggregation? (Most of the examples are straight forward aggregations)

Sorry if this is repeated/naive. Thanks!


Solution

  • Sure, that kind of analysis is easily done with Flink. You'll probably find it easiest to do this with Flink's SQL API, as the learning curve there is gentle -- once you get setup, it's very straightforward, assuming you know some SQL.

    Take a look at https://github.com/ververica/sql-training/ for a guided introduction.