Search code examples
apache-flinkflink-streaming

Flink: Queries regarding flink checkpoint and savepoint


Below are my queries regarding Flink.

  1. Can we store check-points and save-points to external data-structures like RockDB etc.Or is it just the state that we can store in RockDB etc.
  2. Does the state-backend affect check-pointing? If yes, In what way?
  3. What is StateProcessor API? Is it related directly to the save-points and checkpoints we store? What extra benefits does StateProcessor API gives that normal savepoint cannot give?

For 3 Question please answer as descriptive as possible. I am interested in learning StateProcessor API but I would like to understand its application in depth and also in what scenarios is it indispensable.


Solution

    1. Checkpoints and savepoints can only be written to storage that satisfies the requirements for Flink's filesystem abstraction. You want to use something with durability and redundancy, like S3 or HDFS. RocksDB is not supported as a data store for checkpoints or savepoints.

    2. The state backend is involved in checkpointing, and checkpoints are written in a state-backend-specific format. The most significant difference between the heap-based and rocksdb-based state backends regarding checkpointing is that only the RocksDB state backend supports incremental checkpointing.

    3. The state processor API allows you to write applications that can read and write savepoints (and externalized checkpoints). This is useful for inspecting your applications' state for analysis or debugging, performing state migrations, and bootstrapping state for new applications, to give a few examples.