I'm currently developing a hybrid program using c++. I'm using both openMP and MPI. However, I don't know how to specify the number of processors and threads for my job. Let's say that I want to use 5 nodes and I want one MPI processor on each node and I want 24 threads per node.
This is how I submit my_job
right now:
qsub -l select=5:ncpus=24:mpiprocs=5 -l place=scatter:exclhost my_job
and on my_job
script, I'm doing this
#PBS -l select=5:ncpus=24:mpiprocs=5
export OMP_NUM_THREADS=24
mpic++ -O3 myprogram.cpp -o out -fopenmp -lquadmath -std=gnu++11
mpirun -n 5 ./out
However, the performance is very slow, which make me think that there might be a problem with how I'm locating my resources.
Any suggestion?
Per https://www2.cisl.ucar.edu/resources/computational-systems/cheyenne/running-jobs/pbs-pro-job-script-examples that would be
#PBS -l select=5:ncpus=24:mpiprocs=1:ompthreads=24