Search code examples
pythonparallel-processingmpimpi4py

Alternative to mpirun inside python script


I'm facing this curious issue. I don't think the answer is difficult but I can't find it anywhere so I rely on your suggestions.

Suppose I have a parallel python function inside file parallel_func.py:

    def parallel_func():

      parser=OptionParser()
      parser.add_option("-f", "--file",       dest="filename",
                          help="Input FILE", metavar="FILE")
      parser.add_option("--parallel", action="store_true",
                          help="Specify if we need to initialize MPI", 
                          dest="with_MPI", default=False)
      (options, args)=parser.parse_args()

      if options.with_MPI == True:
         from mpi4py import MPI  
         comm = MPI.COMM_WORLD
         myid = comm.Get_rank()
         numberPart = comm.Get_size()
         have_MPI = True
     else:
         comm = 0
         myid = 0
         numberPart = 1
         have_MPI = False
    etc. etc.  

I can call this function from the shell simply typing:

mpirun -np XX parallel_func.py -f input_file --parallel

Now, is there a way to call my parallel_func as a function inside a python script which is not parallel and run in single core? My current version works but only run parallel_func in single core:

from parallel_func.py import parallel_func
# -------------------------------------------------------------------
#  Main
# -------------------------------------------------------------------

def main():

  parser=OptionParser()
  parser.add_option("-f", "--file",       dest="filename",
                      help="Input FILE", metavar="FILE")
  parser.add_option("-n", "--partitions", dest="partitions", 
                      default=1,
                      help="number of PARTITIONS", 
                      metavar="PARTITIONS")                                            

  (options, args)=parser.parse_args()
  options.partitions  = int( options.partitions )  

  if options.partitions > 1:
     options.with_MPI = True
  else:
     options.with_MPI = False

  parallel_func(options)

Long story short, is there a way I can communicate within the python function main all the info required for MPI.COMM_WORLD to make parallel_func work properly?

Thanks in advance for the answer!!


Solution

  • You are looking for dynamic process management. This means you spawn some processes from the single process and they form a communicator. You can find an example here. There are several drawbacks to this approach:

    • You need to provide a special script or entry point for the launched process - you cannot just resume at the point of the call to spawn
    • The spawned processes are in an inter-communicator containing two separate groups (the parent and the spawned processes). They are different to use than your normal (intra-communicator).
    • Some HPC systems / batch systems do not support MPI process spawning

    Take this into account when evaluating how you design your application.