Search code examples
apache-flinkflink-streamingdistributed-system

Does Flink task managers ever talk to the deep store service?


I am interested in understanding how Flink performs checkpoints (or savepoints) a little better. Here is my points of curiosity:

Does Flink Job manager gather all the state from the task managers and then push this data to the deep store service (like GCP or S3)? Or, does each of the task managers and job managers push state independent of each other.


Solution

  • Overall the design is that each task manager writes the state it is managing, and the job manager writes the metadata. Each of these machines (or containers) writes directly to the distributed filesystem, and they all need to use the same URI.

    However, the details are a little more complex. A checkpoint (or savepoint) consists of many files, with each state chunk being written into its own file. Except that any state chunks smaller than state.backend.fs.memory-threshold are collected by the job manager and written inline into the metadata. This helps to avoid writing lots of small files, which can be a problem with object stores such as S3.