Search code examples
apache-flink

Flink: local dir for state.checkpoints.dir


I was trying to understand the implications of using local dir e.g. file:///checkpoints/ for state.checkpoints.dir. My confusion is that 1) there might be multiple TaskManagers, does that mean each will save its own checkpoints to its local disk? 2) does this work in the environment like Kubernetes? because Pods might be moved around in the cluster.


Solution

  • This won't work. state.checkpoints.dir must be a URI that is accessible to every machine in the cluster, i.e., some sort of distributed filesystem. This is necessary for recovery in situations in which a task manager has failed, or when state needs to be redistributed for rescaling.

    You may also want each TaskManager to additionally keep a copy of its state locally for faster recovery; see Task Local Recovery for info on that option.