Search code examples
apache-flink

Data Persistence For Apache Flink SQL Streaming Queries


I want to use Flink SQL for querying of streaming data. Question I have is:

  • Can I apply SQL queries dynamically without having to restart flink?
  • If I create a table from a kafka source, will flink actually create the table and persist the incoming data in that table forever OR it will just delete the rows once they are processed?

Am new to flink and any help on this highly appreciated.

Already visited several blog on Flink SQL but did not get answer to whether the data will be persisted in the table or not.


Solution

  • Can I apply SQL queries dynamically without having to restart flink?

    Each query will create a new Flink job. The jobs for streaming queries will run indefinitely, unless they are applied to bounded streams, or are stopped.

    You can have a Flink session cluster than is always running (and never restarting (unless something fails)), and use its resources to run those queries/jobs. New queries/jobs can come and go without restarting that session cluster.

    If I create a table from a kafka source, will flink actually create the table and persist the incoming data in that table forever OR it will just delete the rows once they are processed?

    Flink's tables don't have any storage of their own -- the data is only persisted in the backing store for the table. If you create a table that is backed by a kafka topic, and then query that table, that has no effect on the retention policy of the underlying kafka topic, and the Row objects that correspond to the events stored in that topic only exist while they are being processed.