Search code examples
gpucpuslurm

Running a job on CPU by default, but on GPU when available in Slurm


Is there a way to submit a job to Slurm with sbatch and use the gpu if available, but run on cpu if there is no gpu available?

Setting: #SBATCH --gres=gpu:1 only runs on nodes where a gpu is available. Omitting it or setting it to 0 never makes a gpu available.


Solution

  • There is unfortunately no direct solution in Slurm for this use case. A workaround can be to submit two jobs, one with --gres and the other without, and

    • naming them --job-name identically
    • setting --dependency=singleton on both
    • inserting scancel --jobname <chosen job name> --state PENDING at the top of the submission script

    The above configuration will make sure only one job can be started by Slurm, and as soon as one starts, it cancels the other.