Search code examples
apache-flinkflink-streamingrocksdbcheckpoint

How to store checkpoint into remote RocksDB in Apache Flink


I know that there are three kinds of state backends in Apache Flink: MemoryStateBackend, FsStateBackend and RocksDBStateBackend.

MemoryStateBackend stores the checkpoints into local RAM, FsStateBackend stores the checkpoints into local FileSystem, and RocksDBStateBackend stores the checkpoints into RocksDB. I have some questions about the RocksDBStateBackend.

As my understanding, the mechanism of RocksDBStateBackend has been embedded into Apache Flink. The rocksDB is a kind of key-value DB. So If I'm right, it means that Flink will store all checkpoints into the embedded rocksDB, which uses the local disk.

If so, I think the disk could be exhausted in some cases because of the checkpoints stored into the rocksDB. Now I'm thinking if it is possible to configure a remote rocksDB to store these checkpoints? If it is possible, should we worry about the remote rocksDB crashing? If the remote rocksDB crashes, the jobs of Flink can not continue working, right?


Solution

  • There is no option to use an external or remote RocksDB with Apache Flink. RocksDB is an embedded key-value store with a local instance in each task manager.

    Several points:

    • Flink makes a strong distinction between the working state, which is always local (for good performance), and state snapshots (checkpoints and savepoints), which are not local (for reliability they should be stored in a distributed file system).

    • The RocksDBStateBackend uses the local disk for working state. The other two state backends keep their working state on the Java heap.

    • The checkpoint coordinator arranges for all of these slices of data scattered across all of the task managers to be collected together into complete checkpoints that are stored elsewhere. In the case of the MemoryStateBackend those checkpoints are stored on the JobManager heap; for the other two, they are in a distributed file system.

    You want to configure RocksDB to use the fastest available local file system. Try to use locally attached SSDs, and avoid network-attached storage (such as EBS). Do not try to use a distributed file system such as S3 as RocksDB's local storage.

    state.backend.rocksdb.localdir controls where each local RocksDB stores its working state.

    The parameter to the RocksDBStateBackend constructor controls where the checkpoints are stored. E.g., using S3 as recommended by @ezequiel is the obvious choice on AWS.