I have a program that uses the master/salve concept for parallelization. There is a master directory and multiple worker directories. I should first run the executive file in the master directory, then go to the working directories and run the working executive in each directory. The master waits for the worker to finish their jobs and send the results to the master for further calculations. The jobs of working directories are independent of each other so they can be run on different machines (nodes). The master and workers communicate with each other using the TCP/IP communications protoco.
I'm working on a cluster with 16 nodes and each node has 28 cores with slurm job manager. I can run my jobs with 20 workers on 1 node totaly fine. currently my slurm script looks like this:
#!/bin/bash
#SBATCH -n 1 # total number of tasks requested
#SBATCH --cpus-per-task=18 # cpus to allocate per task
#SBATCH -p shortq # queue (partition) -- defq, eduq, gpuq.
#SBATCH -t 12:00:00 # run time (hh:mm:ss) - 12.0 hours in this.
cd /To-master-directory
master.exe /h :4004 &
MASTER_PID=$!
cd /To-Parent
# This is the directory that contains all worker (wrk)directories
parallel -i bash -c "cd {} ; worker.exe /h 127.0.0.1:4004" --
wrk1 wrk2 wrk3 wrk4 wrk5 wrk6 wrk7 wrk8 wrk9 wrk10 wrk11 wrk12 wrk13 wrk14
wrk15 wrk16 wrk17 wrk18 wrk19 wrk20
kill ${MASTER_PID}
I was wondering how can I modify this script to divide jobs running on workers between multiple nodes. For example, jobs associated with the wrk1 to wrk5 run on node 1, jobs associated with the wrk6 to wrk10 run on node 2 etc?
First, you need to let Slurm allocate distinct nodes for your job, so you need to remove the --cpus-per-task
option and rather ask for 18 tasks.
Second, you need to get the hostname where the master runs as 127.0.0.1
will no longer be valid in a multi-node setup.
Third, just add srun
before the call the bash
in parallel
. With the --exclusive -n 1 -c 1
, it will dispatch each instance of the worker spawned by parallel
to each of the CPUs in the allocation. They might be on the same node or on other nodes.
So the following could work (untested)
#!/bin/bash
#SBATCH -n 18 # total number of tasks requested
#SBATCH -p shortq # queue (partition) -- defq, eduq, gpuq.
#SBATCH -t 12:00:00 # run time (hh:mm:ss) - 12.0 hours in this.
cd /To-master-directory
master.exe /h :4004 &
MASTER_PID=$!
MASTER_HOSTNAME=$(hostname)
cd /To-Parent
# This is the directory that contains all worker (wrk)directories
parallel -i srun --exclusive -n 1 -c 1 bash -c "cd {} ; worker.exe /h $MASTER_HOSTNAME:4004" --
wrk1 wrk2 wrk3 wrk4 wrk5 wrk6 wrk7 wrk8 wrk9 wrk10 wrk11 wrk12 wrk13 wrk14
wrk15 wrk16 wrk17 wrk18 wrk19 wrk20
kill ${MASTER_PID}
Note that in your example with 18 tasks and 20 directories to process, the job will first run 18 workers and then the two additional ones will be 'micro-scheduled' whenever a previous task finishes.