I have a Software that requires a plain text list of nodes (once per task) where tasks are being sent. For example, if my job was launched with -n 4 -c 1
, and I get 3 CPUs in node1
and 1 CPU in node2
, I'd like to get a file such as:
node1
node1
node1
node2
How can I get such a list?
I tried using:
scontrol show hostnames $SLURM_JOB_NODELIST
But this only works if ALL the tasks are assigned to separate nodes. In the example above, this would just result on:
node1
node2
So the Software would only send one task to each node, and underutilize the CPUs allocated in node1
.
Thanks! Miguel.
The easiest (though perhaps not most canonical) way is probably to run
srun hostname > hostfile
The file hostfile
will contain the list of hostnames, with each hostname present as many times as the number of tasks that were allocated to that host.