Search code examples
cluster-computingschedulingslurm

Is it possible to execute post-script after slurm job execution?


Is it possible to tell slurm that it must execute specific, for example post-script.py, script after the submitted task has been completed?

Not submit new task, just run it on login-node

Something like...

#SBATCH --at-end-run="bash post-script.sh"

Or is it only option to check if task has been completed every N-minutes?


Solution

  • The short answer is that there is no such option in Slurm.

    If post-script.sh can run on a compute node, the best option would be

    • if it is short: to add it at the end of the job submission script
    • if it is long; to submit it in its own job and use --dependency options to make start at the end of the first job.

    If you have root privileges, you can use strigger to run post-script.sh after the job has completer. That would run on the slurmctld server.

    If the post-script.sh must run on the login node, for external network access for instance, then the options first mentioned would work if you are able/allowed to SSH from a compute node to a login node. This is sometimes prevented/forbidden, but if not, then you can run ssh login.node bash post-script.sh at the end of the submission script or in a job of itself.

    If that is not a possibility, then "busy polling" is indeed needed. You can do it in a Bash loop making sure not to put too large a burden on the Slurm server (every 5 minutes is OK, every 5 seconds is useless and harmful to the system).

    You can also use a dedicated workflow management tool such as Maestro that will allow you to define a job and a dependent task to run on the login node.

    See some general information about workflows on HPC systems here.