Is it possible to set some requeue options so that JOBID is changed when slurm decides to requeue a job. (after a node failure, for instance) So that the folder associated to first JOBID is not overwritten.
Thanks,
A requeued job is still the same job, so the job ID will not change.
What you can do is prevent requeuing with the --no-requeue
. But then you will need to re-submit the job, either by hand or using a workflow manager.
Another option, is to append the restart count to the folder name. For instance, if your submission script has a line such as
WORKDIR=/some/path/${SLURM_JOB_ID}
mkdir -p $WORKDIR
cd $WORKDIR
you can replace it with
mkdir -p /some/path/${SLURM_JOB_ID}${SLURM_RESTART_COUNT}
mkdir -p $WORKDIR
cd $WORKDIR
Upon first run, the $SLURM_RESTART_COUNT
will be unset, leaving the original behaviour, but then, it will be set to 1, 2, and so on, effectively suffixing the job ID with the requeue number.
For the name of the output file, you can use --open-mode=append
to avoir overwriting the output file when the job restarts.