I am very new to Slurm and distributed/parallel computing so hoping someone could shed some light on my issue but bearing in mind this might be a bonehead simple problem to solve.
I've set up a cluster using 6 slave Pi3s (and 1 master) and installed the Slurm workload manager to help with allocation of resources etc.
Before I get into more complex code, I am trying to test something super simple. I am sending the same command to all my nodes, and printing the hostname of the node.
My current code (that works) is:
srun --nodes=6 hostname
and returns:
node01
node05
node04
node02
node06
node03
Now I try to run the same type of command using sbatch with the following script:
#!/bin/bash
#SBATCH --nodes=6
#SBATCH --partition=partition
#SBATCH --ntasks-per-node=1
cd $SLURM_SUBMIT_DIR
srun printf ‘Hello from: %s\n’ $(hostname) >> out.txt
expecting a similar result as above, but instead I get:
Hello from: node01
Hello from: node01
Hello from: node01
Hello from: node01
Hello from: node01
Hello from: node01
I tried playing around with the SLURM_NODEID and SLURMD_NODENAME env variables but still couldn't get it to do what I wanted.
I just want to know which node is running the code. The purpose is so that further down the line I am able to trace which operations are done by which nodes for more complex scripts. Maybe compare the performance between nodes that are expected to be "identical". Perhaps even trace which nodes are performing what portion of a parallelized case?
Many thanks !!!!
The part printf ‘Hello from: %s\n’ $(hostname) >> out.txt
is evaluated by Bash before it is offered to srun
. So basically your script is equivalent to
HOST=$(hostname)
srun printf ‘Hello from: %s\n’ $HOST >> out.txt
This runs the same printf
command with the same variable expanded. If you simply run
srun hostname
in your submission script you will see a result identical to the one you obtain when running srun
directly (outside of a submission script)
If you want to run printf
you should do something like this:
srun bash -c "printf 'Hello from: %s\n' \$(hostname)" >> out.txt