Questions on alternative ways to run 4 parallel jobs

Below are three different sbatch scripts that produce roughly similar results.

(I show only the parts where the scripts differ; the ## prefix indicates the output obtained by submitting the scripts to sbatch.)

Script 0

#SBATCH -n 4


srun -l hostname -s


## ==> slurm-7613732.out <==
## 0: node-73
## 1: node-73
## 2: node-73
## 3: node-73

Script 1

#SBATCH -n 1
#SBATCH -a 1-4

srun hostname -s


## ==> slurm-7613733_1.out <==
## node-72
## 
## ==> slurm-7613733_2.out <==
## node-73
## 
## ==> slurm-7613733_3.out <==
## node-72
## 
## ==> slurm-7613733_4.out <==
## node-73

Script 2

#SBATCH -N 4


srun -l -n 4 hostname -s

## ==> slurm-7613738.out <==
## 0: node-74
## 2: node-76
## 1: node-75
## 3: node-77

Q: Why would one choose one such approach over the others?

(I see that the jobs spawned by Script 0 all ran on the same node, but I can't tell if this is a coincidence.)

Also, the following variant of Script 2 (the only difference being -N 2 instead of -N 4) fails:

Script 3

#SBATCH -N 2


srun -l -n 4 hostname -s


## ==> slurm-7614825.out <==
## srun: error: Unable to create job step: More processors requested than permitted

Ditto for the following variant of Script 2 (the only difference between this and Script 3 is that here srun also has the flag -c 2):

Script 4

#SBATCH -N 2


srun -l -n 4 -c 2 hostname -s


## ==> slurm-7614827.out <==
## srun: error: Unable to create job step: More processors requested than permitted

Qs: are the errors I get with Script 3 and Script 4 due to wrong syntax, wrong semantics, or site-specific configs? IOW, is there something inherently wrong with these scripts (that would cause them to fail under any instance of SLURM), or are the errors only due to violations of restrictions imposed by the particular instance of SLURM I'm submitting the jobs to? If the latter is the case, how can I pinpoint the configs responsible for the error?

Solution

Q: Why would one choose one such approach over the others?

Script 0: you request 4 tasks, to be allocated at the same time to a single job, with no other specification as to how those tasks should be allocated to nodes. Typical use: an MPI program.

Script 1: you request 4 jobs, each with 1 task. The jobs will be scheduled independently one from another. Typical use: Embarrassingly parallel jobs.

Script 2: you request 4 nodes, with one task per node. It is similar to Script 0 except that you request the tasks to be allocated to four distinct nodes. Typical use: MPI program with a lot of IOs on local disks for instance.

The fact that all jobs were allocated the same first node is due to the fact that Slurm always allocates the nodes in the same order, and you probably run all the tests one after another so the other started on the resources the previous one just freed.

Script 3: You request two nodes, with implicitly, 1 task per node, so you are allocated two tasks, but then you try to use 4 tasks with srun. You should change it to

#SBATCH -N 2
#SBATCH --tasks-per-node 2

srun -l -n 4 hostname -s

two request two tasks per node, or

#SBATCH -N 2
#SBATCH -n 4

srun -l -n 4 hostname -s

to request four tasks, with no additional constraint on the distribution of tasks across nodes.

Script 4: You request two nodes, with implicitly, 1 task per node, and, also implicitly, one CPU per task, so you are allocated two CPUs, but then you try to use 4 tasks with srun, each with 2 CPUS so 8 in total. You should change it to

#SBATCH -N 2
#SBATCH --tasks-per-node 2
#SBATCH --cpus-per-task 2    

srun -l -n 4 -c 2 hostname -s

or,

#SBATCH -N 2
#SBATCH -n 4
#SBATCH --cpus-per-task 2    

srun -l -n 4 -c 2 hostname -s

The bottom line: in the submission script, you request resources with the #SBATCH directives, and you cannot use more resource than that in the subsequent calls to srun.