Search code examples
mpimpi4py

why does this setup forming sub communicators deadlock in mpi4py


system: MacOSX 14.4.1
python: 3.11.8
mpi4py: 3.1.5
OpenMPI: 5.0.2 installed with homebrew

I have the following python script. The deadlock happens when I have 3 MPI ranks. I want to create a subcommunicator for ranks 0 and 1, and another for ranks 1 and 2. This means that rank 1 only needs to know about the first sub communicator, rank 2 needs to know about the second, but rank 1 needs to know about both of them.

However, although rank 1 successfully creates the first communicator, it deadlocks when creating the second. Why is that?

from mpi4py import MPI
from typing import List

world_comm = MPI.COMM_WORLD
world_size = world_comm.Get_size()
world_rank = world_comm.Get_rank()

assert world_size == 3

communicator_ranks = [
    [(0, 1)],
    [(0, 1), (1, 2)],
    [(1, 2)]
]

communicators: List[MPI.Comm] = []
for ranks in communicator_ranks[world_rank]:
    group = world_comm.group.Incl(ranks)
    print(f"rank: {world_rank}, forming communictor for ranks: {ranks}")
    comm = world_comm.Create(group)
    print(f"rank: {world_rank}, forming communictor for ranks: {ranks} -- DONE")
    communicators.append(comm)

world_comm.barrier()
print("all done")

when running mpirun -n 3 python deadlock.py I get the following printout:

> mpirun -n 3 python deadlock.py
rank: 2, forming communictor for ranks: (1, 2)
rank: 1, forming communictor for ranks: (0, 1)
rank: 0, forming communictor for ranks: (0, 1)
rank: 0, forming communictor for ranks: (0, 1) -- DONE
rank: 2, forming communictor for ranks: (1, 2) -- DONE
rank: 1, forming communictor for ranks: (0, 1) -- DONE
rank: 1, forming communictor for ranks: (1, 2)

Note, the last entry has no DONE counterpart and the program just sits there waiting forever


Solution

  • Your program is incorrect w.r.t. the MPI standard.

    From MPI 4.2 chapter 7.4 page 325:

    If an MPI process calls with a nonempty group then all MPI processes in that group must call the function with the same group as argument, that is the same MPI processes in the same order. Otherwise, the call is erroneous.

    You might try the mpi4py binding for MPI_Comm_create_from_group() instead, or use MPI_GROUP_EMPTY