Search code examples
mpicluster-computingbatch-processingjob-schedulinglsf

Batch script for LSF when only one MPI process among the others has 2 or more threads


My program uses MPI+pthreads, where n-1 MPI processes are pure MPI code whereas the only one MPI process uses pthreads. The last process contains only 2 threads( main thread and pthread ). Suppose that the HPC cluster I want to run this program on consists of compute nodes, each of which has 12 cores. How should I write my batch script to maximise utilization of the hardware?

Following is my batch script I wrote. I use export OMP_NUM_THREADS=2 because the last MPI process has 2 threads and have to assume that the others have 2 threads each as well.

Then I allocate 6 MPI processes per node, so each node can run 6xOMP_NUM_THREADS = 12(=the number of cores on each node) threads despite the fact that all MPI processes but one have 1 thread.

#BSUB -J LOOP.N200.L1000_SIMPLE_THREAD
#BSUB -o LOOP.%J
#BSUB -W 00:10
#BSUB -M 1024
#BSUB -N
#BSUB -a openmpi
#BSUB -n 20
#BSUB -m xxx
#BSUB -R "span[ptile=6]"
#BSUB -x

export OMP_NUM_THREADS=2

How can I write a better script for this ?


Solution

  • The following should work if you'd like the last rank to be the hybrid one:

    #BSUB -n 20
    #BSUB -R "span[ptile=12]"
    #BSUB -x
    
    $MPIEXEC $FLAGS_MPI_BATCH -n 19 -x OMP_NUM_THREADS=1 ./program : \
             $FLAGS_MPI_BATCH -n 1  -x OMP_NUM_THREADS=2 ./program
    

    If you'd like rank 0 to be the hybrid one, simply switch the two lines:

    $MPIEXEC $FLAGS_MPI_BATCH -n 1  -x OMP_NUM_THREADS=2 ./program : \
             $FLAGS_MPI_BATCH -n 19 -x OMP_NUM_THREADS=1 ./program
    

    This utilises the ability of Open MPI to launch MIMD programs.

    You mention that your hybrid rank uses POSIX threads and yet you are setting an OpenMP-related environment variable. If you are not really using OpenMP, you don't have to set OMP_NUM_THREADS at all and this simple mpiexec command should suffice:

    $MPIEXEC $FLAGS_MPI_BATCH ./program
    

    (in case my guess about the educational institution where you study or work turns out to be wrong, remove $FLAGS_MPI_BATCH and replace $MPIEXEC with mpiexec)