Is there a way to submit a job to Slurm with sbatch and use the gpu if available, but run on cpu if there is no gpu available?
Setting: #SBATCH --gres=gpu:1
only runs on nodes where a gpu is available.
Omitting it or setting it to 0 never makes a gpu available.
There is unfortunately no direct solution in Slurm for this use case. A workaround can be to submit two jobs, one with --gres
and the other without, and
--job-name
identically--dependency=singleton
on bothscancel --jobname <chosen job name> --state PENDING
at the top of the submission scriptThe above configuration will make sure only one job can be started by Slurm, and as soon as one starts, it cancels the other.