Search code examples
apache-flinkflink-streamingamazon-kinesisflink-sql

Setup Flink application with multiple different SQL query(aggregation)


I need to build a pipeline with different column aggregations from a same source, say, one by userId, another by productId, and etc. I also want to have different granularity aggregations, say, by hourly, daily. Each aggregation will have a different sink, say a different nosql table.

It seems simple to build a SQL query with Table API. But I would like to reduce the operation overhead of managing too many Flink apps. So I am thinking putting all different SQL queries in one pyflink app.

This is first time I build Flink app. So I am not sure how feasible this is. In particular, I'd like to know:

  • Read the Flink doc, I see there are concepts of application vs job. So I am curious if each SQL aggregation query is a single Flink job?
  • will the overall performance degraded because of too many queries in one Flink app?
  • since the queries share a same source(from kinesis), will each query get a copy of the source. Basically, I want to make sure each event will be processed by each sql aggregation query.

Thanks!


Solution

  • You can put multiple queries into one job if you use a statement set: https://docs.ververica.com/user_guide/sql_development/sql_scripts.html