Search code examples
bashparallel-processingdistributed-computinghostnameslurm

How to tell which node is executing code as it executes in Slurm?


I am very new to Slurm and distributed/parallel computing so hoping someone could shed some light on my issue but bearing in mind this might be a bonehead simple problem to solve.

I've set up a cluster using 6 slave Pi3s (and 1 master) and installed the Slurm workload manager to help with allocation of resources etc.

Before I get into more complex code, I am trying to test something super simple. I am sending the same command to all my nodes, and printing the hostname of the node.

My current code (that works) is:

srun --nodes=6 hostname

and returns:

node01
node05
node04
node02
node06
node03

Now I try to run the same type of command using sbatch with the following script:

 #!/bin/bash
 #SBATCH --nodes=6
 #SBATCH --partition=partition
 #SBATCH --ntasks-per-node=1

 cd $SLURM_SUBMIT_DIR
 srun printf ‘Hello from: %s\n’ $(hostname) >> out.txt

expecting a similar result as above, but instead I get:

Hello from: node01
Hello from: node01
Hello from: node01
Hello from: node01
Hello from: node01
Hello from: node01

I tried playing around with the SLURM_NODEID and SLURMD_NODENAME env variables but still couldn't get it to do what I wanted.

I just want to know which node is running the code. The purpose is so that further down the line I am able to trace which operations are done by which nodes for more complex scripts. Maybe compare the performance between nodes that are expected to be "identical". Perhaps even trace which nodes are performing what portion of a parallelized case?

Many thanks !!!!


Solution

  • The part printf ‘Hello from: %s\n’ $(hostname) >> out.txt is evaluated by Bash before it is offered to srun. So basically your script is equivalent to

    HOST=$(hostname)
    srun printf ‘Hello from: %s\n’ $HOST >> out.txt
    

    This runs the same printf command with the same variable expanded. If you simply run

    srun hostname
    

    in your submission script you will see a result identical to the one you obtain when running srun directly (outside of a submission script)

    If you want to run printf you should do something like this:

    srun bash -c "printf 'Hello from: %s\n' \$(hostname)" >> out.txt