I'm running MPI with OpenMP, I found that with this command, even though OpenMP launched the thread number I defined, they all stick to one CPU core.
export OMP_NUM_THREADS=8
export OMP_PLACES=cores
export OMP_PROC_BIND=true
mpirun --host n1,n2,n3,n4 -np 4 a.out # the threads all stick to one core at each node
mpirun --host n1,n2,n3,n4 -np 4 grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list: 0
Cpus_allowed_list: 0
Cpus_allowed_list: 0
Cpus_allowed_list: 0
Latter on, I found this solution, it works out fine on my cluster:
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
echo "Nodelist: $SLURM_JOB_NODELIST"
echo "CoerPerTask: $SLURM_CPUS_PER_TASK"
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
mpirun --map-by node:PE=$SLURM_CPUS_PER_TASK ./main 14000