Search code examples
cluster-computingbatch-processingslurm

Solving SLURM "sbatch: error: Batch job submission failed: Requested node configuration is not available" error


We have a 4 GPU nodes with 2 36-core CPUs and 200 GB of RAM available at our local cluster. When I'm trying to submit a job with the follwoing configuration:

#SBATCH --nodes=1
#SBATCH --ntasks=40
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1500MB
#SBATCH --gres=gpu:4
#SBATCH --time=0-10:00:00

I'm getting the following error:

sbatch: error: Batch job submission failed: Requested node configuration is not available

What might be the reason for this error? The nodes have exactly the kind of hardware that I need...


Solution

  • The CPUs are most likely 36-threads not 36-cores and Slurm is probably configured to allocate cores and not threads.

    Check the output of scontrol show nodes to see what the nodes really offer.