Search code examples
jobsslurm

How to submit multiple independent tasks via SLURM?


I have a large amount of computation chunks which I would like to run in parallel on a SLURM cluster. Each chunk reads inputs from files and writes outputs to separate files - there is no communication between chunks and therefore chunks do not need to run at the same time. I would like to spread the chunks across nodes and CPUs. Each computation chunk requires only one CPU (it is single-threaded), so the limit of parallel chunk computation on one node is given by the amount of available RAM on each node.

Note: I use the term "computation chunk" to avoid confusion with the SLURM terms "job" and "task".

What is the best way to send the computation chunks to the cluster via SLURM?

I assume that it is not a good idea to submit a job requesting multiple nodes, i.e. setting --nodes to a value larger than 1. This would require multiple nodes to be available at the same time, which could increase the waiting time in queue.

I could submit multiple jobs or use job arrays to submit multiple jobs at once. But how can I run multiple computation chunks on one node? Can multiple jobs run on the same node?

If I set --ntasks to the number of computation chunks I need to run and --ntasks-per-node to the number of cores on a node, would SLURM automatically spread the processes across nodes? If yes, would these nodes be requested synchronous, i.e. causing a longer queuing time?

As far as I understand, I can use --ntasks and srun to run a command multiple times (also without use of MPI) on one node. But how would each process know which chunk to compute? Is there something like the env variable SLURM_ARRAY_TASK_ID for tasks?

Is there any difference from using a shell loop in the script to using --ntasks?

This SO answer gives a very good overview of the implications of the option --ntasks and --ntasks-per-node, but it does not really answer my questions.


Solution

  • But how can I run multiple computation chunks on one node? Can multiple jobs run on the same node?

    Yes. Slurm will happily run several jobs on the same node. For a job to request a full node (regardless of how many CPU cores it actually uses), it must include

    #SBATCH --exclusive
    

    By default, jobs share nodes and RAM, but not CPU cores/threads.

    You can just submit a bunch of small jobs, each requesting a single core. I.e.

    #SBATCH --nodes=1
    #SBATCH --ntasks-per-node=1
    #SBATCH --ntasks-per-core=1
    #SBATCH --cpus-per-task=1
    

    (the above is probably more specific than it need be, but it will work).

    Is there any difference from using a shell loop in the script to using --ntasks?

    Not much, providing your loop launches subprocesses in the background (e.g. using &). Having Slurm do the work might be easier, though. You can then call sbatch in a loop instead, no & required.

    But how would each process know which chunk to compute?

    If you launch 100 jobs, can you not just give each task a specific chunk, indexed by an integer? Just include this integer in your job script, and increment it after each submission (sbatch). For many jobs, script this rather than do it manually.

    There's also array jobs, as you say, which might be more fitting for your situation. In the end though, there's nothing you cannot achieve by just launching a swarm of small, regular jobs.

    Just go for it. Slurm is pretty smart about sharing resources. The dumbest thing you can come up with will probably work. Good luck!