Search code examples
apache-sparkapache-spark-standalone

Shuffle files cleanup in Spark with externalShuffle service


We are using Spark 3.0.1 (standalone mode) with dynamic allocation and external shuffle service.

After switching to dedicated persistent disks we started getting "out of disk space errors", so we looked into the /tmp folder and noticed many older application shuffle files still exists there, which is somewhat understandable as shuffle files should stay available as long as the worker is alive.

But at what point do these files gets deleted? I would assume that on job completion the shuffle files are no longer needed and can (and should) be deleted, but looking at spark code base (ExternalShuffleBlockResolver) it shows the directory cleaner is triggered only after:

  • applicationRemoved - deletes entire executor dir (shuffle files included)
  • executorRemoved - deletes only non-shuffle files

Am I missing anything? what service should cleanup shuffle files of completed/failed jobs?

Note that we use the multi-tenant architecture, meaning our spark app acts as a server to multiple requests.


Solution

  • This feature was implemented spark starting 3.4.0. Upgrade your spark cluster to fix the issue. check the fix in this pr.