I don't have much experience with MPI and I try to understand how allreduce work. Below is a simple example inspired by this IPython tutorial. 2 MPI engines are launched on a local computer from the IPython notebook dasboard here:
In [1]: import numpy as np
from IPython.parallel import Client
In [2]: c = Client(profile='mpi')
In [3]: view = c[:]
In [4]: view.scatter('a', np.arange(4.))
Out[4]: <AsyncResult: scatter>
In [5]: %%px
from mpi4py import MPI
import numpy as np
print MPI.COMM_WORLD.allreduce(np.sum(a), op=MPI.SUM)
[stdout:0] 1.0
[stdout:1] 5.0
I would have expected each engine to print "6.0", like in the IPython tutorial. Here, it is as if the reduction operation was not performed. It's probably very simple, but I don't quite see what I am doing wrong?
I use:
That is the behavior you would see if your engines were not actually started with MPI. Since your engines have no MPI peers, allreduce doesn't do anything - it just returns the value of np.sum(a)
on each engine, which is what you are seeing.
It's a good idea to check that MPI is set up properly:
%px print MPI.COMM_WORLD.Get_rank(), MPI.COMM_WORLD.Get_size()
If your engines are not in the same MPI world, your output will look like this:
[stdout:0] 0 1
[stdout:1] 0 1
And if they are:
[stdout:0] 0 2
[stdout:1] 1 2
Make sure you start your engines with MPI. For example:
ipcluster start --engines MPI
Or add to ipcluster_config.py:
c.IPClusterEngines.engine_launcher_class = 'MPI'
Or just do it manually without any configuration (this is all the above configuration does anyway):
mpiexec -n 4 ipengine