Search code examples
slurm

How to preserve the code of a submitted job on a Slurm server cluster when updating the codebase with Git Pull?


How can I ensure that the code of a submitted job on a Slurm server cluster is preserved, even if I update the codebase with Git Pull after submitting the job?

When submitting a job to a Slurm server cluster, I am concerned that updating the codebase with Git Pull after submission could potentially result in changes to the code of the submitted job. I have heard that Slurm copies the code of a submitted job to a temporary directory, typically located at /var/slurm/job-jobid, once the job begins running. However, I am not sure if this is true.

I would like to know if there are any best practices or recommendations for ensuring that the code of a submitted job on a Slurm server cluster is preserved and not affected by updates to the codebase. Additionally, any information on how Slurm handles the copying and preservation of submitted job code would be greatly appreciated.

Thank you in advance for your help.


Solution

  • Slurm copies the code of a submitted job to a temporary directory

    Slurm only copies the submission script, nothing else. So if, in your submission script, you are using a program "installed" by a git clone in your home directory, any git pull command issue before the job starts will affect the version of the program that is run inside the job.

    A better option is to clone the codebase inside the job submission script onto a temporary, job-specific, location. That way, you can specify a version or a commit hash corresponding to the version of the code you want. You can clone from the original source (e.g. GitHub), or from the clone in your home directory, which should be easier/faster. Also, you can only clone the revision you want and not the whole repository (see this).