Search code examples
apache-flink

What are blobstore files and why do they keep filling up /tmp directory?


We're running Flink on a standalone five node cluster. The /tmp/ directory keeps filling with directories starting with blobstore--*. These directories are very large (approx 1 GB) and fill up the space very quickly and the jobs fail with a No space left of device error. The files in these directories appear to be some form of binary representation of the jobs that are running on the cluster.

What are these files and how do I take care of cleaning them so they don't fill up /tmp/ causing jobs to fail?

Flink version: 1.4.2


Solution

  • The blob store files are necessary to distribute the Flink job in the cluster. After the job has been completed, they should be cleaned up. Only in the case of cluster crashes the clean up should not happen.

    In case of a cluster restart, the old blobstore files need to cleared using a clean up job. When deleting the directories one has to be careful not to delete the directory of a running TaskManager. This could be found out by looking into the logs of the running TaskManagers. They should contain the path of the blob store directory.

    http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/What-are-blobstore-files-and-why-do-they-keep-filling-up-tmp-directory-td26323.html