PBS: GNU parallel: hosts allocated vary, multi CPU job, multiple jobs to some hosts

With PBSpro I can request resources to run my job. My parallel cluster job boils down to running the same file multiple times, each time with a different index / job ID. Each task spawns its own sub-processes and each task in total uses 4 CPUs. This job is embarrassingly parallel, with each task independent of each other, and thus a good fit for the GNU parallel tool. To get the best usage of the cluster and squeeze my tasks in where ever there is space, I place a resource request to PBS as follows: PBS -l select=60:ncpus=4:mpiprocs=1. The resulting $PBS_NODEFILE then contains a list of hosts assigned to the task.

The problem comes in with the fact that the PBSpro job manager can assign multiple jobs to the same node, or only 1 job to a node and somehow this information has to be passed to GNU parallel. Doing so with --sshloginfile $PBS_NODEFILE does not carry over the varying resources information available on each node (and it appears GNU parallel only uses unique names from this list).

Things that go wrong are that GNU parallel sees X number of cores (the number of cores for the host / node) regardless whether only 1 job was assigned to that host. Limiting the number of jobs per host results in inefficient host usage with cores idle, or running more tasks on the host than available resources oversubscribing the cores.

The problem boils down to:

How can one efficiently run parallel tasks through PBSpro,
each task using more than 1 CPU,
over a random (PBS allocated) selection of nodes,
each with a varying number of assigned resources,
that don't necessarily match the actual physical resources of the node.

Solution

Use the -S flag to specify the servers and the x/$SERVERNAME variant thereof to limit the number of CPUs (x) for that server.

The first step is to use bash to generate the input the -S flag

NCPU=4

HOSTS=`cat $PBS_NODEFILE | uniq -c | awk 'BEGIN{OFS=""}{print $1*$NCPU,"/",$2}'|tr '\n' ','|sed 's/,$/ /'` (credit to Hiu)

This bash command outputs a list of servers, each with the number of available cpu cores.

Thereafter run parallel as follows:

PERC=$((100/$NCPU))

seq 0 999 | parallel -j $PERC% -N1 -u -S $HOSTS "cd $PBS_O_WORKDIR; python3 $WORKING_PATH$INPUT_FILENAME {}"

Where:

seq 0 999 runs 1000 tasks with IDs ranging from 0 to and including 999
-j $PERC% = -j 25% (100% / 4 for 4 CPUs)
-N1 to send only 1 argument to each task
-u prints output immediately (and has some speed advantages)