I'm trying to spawn a set of worker processes across several hosts using MPI4py and OpenMPI, but the spawn command seems to ignore my host file. I've posted my full test, but here are the key parts:
Based on a forum discussion, my manager script calls spawn with the hostfile
option:
mpi_info = MPI.Info.Create()
mpi_info.Set("hostfile", "worker_hosts")
comm = MPI.COMM_SELF.Spawn(sys.executable,
args=['testworker.py'],
maxprocs=args.worker_count,
info=mpi_info).Merge()
In the worker_hosts
file, I list the nodes in my Scyld Beowulf cluster:
myhead1 slots=2
mycompute1 slots=2
mycompute2 slots=2
mycompute3 slots=2
mycompute4 slots=3
The manager and the workers all call MPI.Get_processor_name()
, but they all report "myhead1". If I use the same host file with mpirun
it works:
> mpirun -hostfile worker_hosts -np 3 python -c "from mpi4py import MPI; print MPI.Get_processor_name()"
myhead1
myhead1
mycompute1
If I change the name of the host file to something that doesn't exist, like bogus_file
, I get an error:
--------------------------------------------------------------------------
Open RTE was unable to open the hostfile:
bogus_file
Check to make sure the path and filename are correct.
--------------------------------------------------------------------------
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file base/rmaps_base_support_fns.c at line 83
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file rmaps_rr.c at line 82
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file base/rmaps_base_map_job.c at line 88
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file base/plm_base_launch_support.c at line 105
[Bulbasaur:86523] [[3458,0],0] ORTE_ERROR_LOG: Not found in file plm_rsh_module.c at line 1173
So OpenMPI has noticed the hostfile
option, it just doesn't seem to use it. The hostfile
option is listed in the OpenMPI documentation.
Key Type Description
--- ---- -----------
host char * Host on which the process should be spawned.
See the orte_host man page for an
explanation of how this will be used.
hostfile char * Hostfile containing the hosts on which
the processes are to be spawned. See
the orte_hostfile man page for an
explanation of how this will be used.
How can I specify the host file for a spawn request?
I found a more recent version of the OpenMPI documentation that gave me the magic option:
Key Type Description
--- ---- -----------
host char * Host on which the process should be
spawned. See the orte_host man
page for an explanation of how this
will be used.
hostfile char * Hostfile containing the hosts on which
the processes are to be spawned. See
the orte_hostfile man page for
an explanation of how this will be
used.
add-host char * Add the specified host to the list of
hosts known to this job and use it for
the associated process. This will be
used similarly to the -host option.
add-hostfile char * Hostfile containing hosts to be added
to the list of hosts known to this job
and use it for the associated
process. This will be used similarly
to the -hostfile option.
If I change to using add-hostfile
, it works perfectly:
mpi_info.Set("add-hostfile", "worker_hosts")
If you're stuck using the older version of OpenMPI, try running the manager script with mpirun
and the same hostfile. That also worked when I was still using the hostfile
option.
mpirun -hostfile worker_hosts -np1 python testmanager.py