Search code examples
pythonspawnmpi4py

Spawn processes using host file with mpi4py


I'm trying to spawn a set of worker processes across several hosts using MPI4py and OpenMPI, but the spawn command seems to ignore my host file. I've posted my full test, but here are the key parts:

Based on a forum discussion, my manager script calls spawn with the hostfile option:

mpi_info = MPI.Info.Create()
mpi_info.Set("hostfile", "worker_hosts")

comm = MPI.COMM_SELF.Spawn(sys.executable,
                           args=['testworker.py'],
                           maxprocs=args.worker_count,
                           info=mpi_info).Merge()

In the worker_hosts file, I list the nodes in my Scyld Beowulf cluster:

myhead1 slots=2
mycompute1 slots=2
mycompute2 slots=2
mycompute3 slots=2
mycompute4 slots=3

The manager and the workers all call MPI.Get_processor_name(), but they all report "myhead1". If I use the same host file with mpirun it works:

> mpirun -hostfile worker_hosts -np 3 python -c "from mpi4py import MPI; print MPI.Get_processor_name()"
myhead1
myhead1
mycompute1

If I change the name of the host file to something that doesn't exist, like bogus_file, I get an error:

--------------------------------------------------------------------------
Open RTE was unable to open the hostfile:
    bogus_file
Check to make sure the path and filename are correct.
--------------------------------------------------------------------------
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file base/rmaps_base_support_fns.c at line 83
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file rmaps_rr.c at line 82
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file base/rmaps_base_map_job.c at line 88
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file base/plm_base_launch_support.c at line 105
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file plm_rsh_module.c at line 1173

So OpenMPI has noticed the hostfile option, it just doesn't seem to use it. The hostfile option is listed in the OpenMPI documentation.

Key                   Type      Description
---                   ----      -----------
host                  char *    Host on which the process should be spawned.
                                See the orte_host man page for an
                                explanation of how this will be used.
hostfile              char *    Hostfile containing the hosts on which
                                the processes are to be spawned. See
                                the orte_hostfile man page for an
                                explanation of how this will be used.

How can I specify the host file for a spawn request?


Solution

  • I found a more recent version of the OpenMPI documentation that gave me the magic option:

    Key                    Type     Description
    ---                    ----     -----------
    host                   char *   Host on which the process should be
                                    spawned.  See the orte_host man
                                    page for an explanation of how this
                                    will be used.
    hostfile               char *   Hostfile containing the hosts on which
                                    the processes are to be spawned. See
                                    the orte_hostfile man page for
                                    an explanation of how this will be
                                    used.
    add-host               char *   Add the specified host to the list of
                                    hosts known to this job and use it for
                                    the associated process. This will be
                                    used similarly to the -host option.
    add-hostfile           char *   Hostfile containing hosts to be added
                                    to the list of hosts known to this job
                                    and use it for the associated
                                    process. This will be used similarly
                                    to the -hostfile option.
    

    If I change to using add-hostfile, it works perfectly:

    mpi_info.Set("add-hostfile", "worker_hosts")
    

    If you're stuck using the older version of OpenMPI, try running the manager script with mpirun and the same hostfile. That also worked when I was still using the hostfile option.

    mpirun -hostfile worker_hosts -np1 python testmanager.py