I used to run like
mpirun -np N ./c_or_python_script
However, on clusters with PBS
job submission queue (I don't know what this type of submission is called), the example scripts do not ask for this -np N
argument! I still give it. What is the difference? I am specifying a mock example script from our cluster below.
#!/bin/sh
#PBS -V
#PBS -N mpi_job
#PBS -q normal
#PBS -A etc
#PBS -l select=4:ncpus=64:mpiprocs=64
#PBS -l walltime=04:00:00
cd $PBS_O_WORKDIR
mpirun ./test_mpi.exe
What would change if I change to mpirun -np 256 ./test_mpi.exe
?
Thank you. I am not an expert in this field.
P.S. In bsub
submission system, I understood the difference.
That depends on the MPI implementation you are using and on the extent to which it integrates with the resource manager. For example, Open MPI has tight integration with many such resource managers, including PBS, LSF, SLURM, and so on. When run inside a batch job, it automatically discovers the details of the allocation and launches as many processes as is the number of allocated CPU slots unless you tell it otherwise with -np
.In your case, you are asking for 4 nodes with 64 CPU slots each for a total of 256 CPU slots and thus passing -np 256
changes nothing. If you ask MPI to launch less processes, e.g., with -np 128
, then some CPU slots will remain unused. If you ask for more processes, e.g., with -np 300
, then Open MPI will usually complain and refuse to run the program unless you explicitly enable oversubscription with -oversubscribe
, which may not be to the liking of the resource manager or the cluster administrators. Other MPI implementations work in a similar manner.