Search code examples
bashhpcslurm

How to run multiple tasks on multiple nodes with slurm (in parallel)?


I want to run a file two times with different arguments, each task on 1 node , for example task 1 on node 1 and task 2 on node 2, with my code only the first task is executed. I don't know what is the problem , I'm new on this, this is my code:

 #!/bin/bash

 node_names=(compute-0-4 compute-0-6)
 parameter=(parte__00 parte__01)

 #SBATCH -N 2
 #SBATCH -n 2
 #SBATCH -c 1

 srun -n1 -N1 -w $node_names[0] file.sh $parameter[0] &
 srun -n1 -N1 -w $node_names[1] file.sh $parameter[1] &
 wait

When I run the code just the last job is queued, if a execute scontrol show job I get this command

which is just the second job queued , the first job is not queued


Solution

  • The #SBATCH lines have to be before any non-comment line. Try with something like this:

     #!/bin/bash
     #SBATCH -N 2
     #SBATCH -n 2
     #SBATCH -c 1
    
     node_names=(compute-0-4 compute-0-6)
     parameter=(parte__00 parte__01)
    
    
     srun -n1 -N1 -w $node_names[0] file.sh $parameter[0] &
     srun -n1 -N1 -w $node_names[1] file.sh $parameter[1] &
     wait
    

    Also, you can just submit 2 jobs if your applications are completely independent, instead of trying to run everything in just 1 job.