How does MPI allreduce work

I don't have much experience with MPI and I try to understand how allreduce work. Below is a simple example inspired by this IPython tutorial. 2 MPI engines are launched on a local computer from the IPython notebook dasboard here:

In [1]: import numpy as np
        from IPython.parallel import Client

In [2]: c = Client(profile='mpi')

In [3]: view = c[:]

In [4]: view.scatter('a', np.arange(4.))
Out[4]: <AsyncResult: scatter>

In [5]: %%px
        from mpi4py import MPI
        import numpy as np

        print MPI.COMM_WORLD.allreduce(np.sum(a), op=MPI.SUM)
[stdout:0] 1.0
[stdout:1] 5.0

I would have expected each engine to print "6.0", like in the IPython tutorial. Here, it is as if the reduction operation was not performed. It's probably very simple, but I don't quite see what I am doing wrong?

I use:

Ubuntu 12.04
Python 2.7.3 32-bit
IPython 1.1.0
mpi4py 1.2.2
mpich2

Solution

That is the behavior you would see if your engines were not actually started with MPI. Since your engines have no MPI peers, allreduce doesn't do anything - it just returns the value of np.sum(a) on each engine, which is what you are seeing.

It's a good idea to check that MPI is set up properly:

%px print MPI.COMM_WORLD.Get_rank(), MPI.COMM_WORLD.Get_size()

If your engines are not in the same MPI world, your output will look like this:

[stdout:0] 0 1
[stdout:1] 0 1

And if they are:

[stdout:0] 0 2
[stdout:1] 1 2

Make sure you start your engines with MPI. For example:

ipcluster start --engines MPI

Or add to ipcluster_config.py:

c.IPClusterEngines.engine_launcher_class = 'MPI'

Or just do it manually without any configuration (this is all the above configuration does anyway):

mpiexec -n 4 ipengine