Assume we have 2 Job Managers (ZooKeeper for HA) and 3 Task Managers. I have configured FsStateBackend for checkpointing. I assume that the FsStateBackend runs in each of the Task Managers which maintains the state in the memory. Upon checkpoint, the state is persisted in the path which we have configured(file:/data). Basically I have configured the path pointing to the local file system. So each of the Task Managers has its own local disk storage where checkpointed data is persisted. As per my understanding a small meta data is sent to Job Manager on checkpointing.
Thank you
You should always use a distributed file system for checkpointing. Something like HDFS, S3, GFS, NFS, Ceph, etc. Furthermore, the storage path used must be accessible from all participating processes/nodes (i.e. all Task Managers and Job Managers).
Otherwise, as you've pointed out, the checkpoint data would be lost if a local disk failed.
The Job Manager has complete knowledge concerning checkpointing, and if you have HA configured, this information is stored in the configured HA storage provider in order to enable Job Manager failover.