Search code examples
mpicommunicator

MPI: Multiple Overlapping Communicators


I want to create MPI communicators linking the process with rank 0 to every other process. Suppose n is the total number of processes. Then the process with rank 0 is supposed to have n-1 communicators while each of the other processes has one communicator. Is this possible, and, if it is, why can I not use the program below to achieve this?

Compiling the code below using the mpic++ compiler terminates without warnings and errors on my computer. But when I run the resulting program using 3 or more processes (mpiexec -n 3), it never terminates.

Likely I'm misunderstanding the concept of communicators in MPI. Maybe someone can help me understand why the program below gets stuck, and what is a better way to create those communicators? Thanks.

#include <iostream>
#include <vector>
#include <thread>
#include <chrono>
#include "mpi.h"
void FinalizeMPI();
void InitMPI(int argc, char** argv);
int main(int argc, char** argv) {
    InitMPI(argc, argv);

    int rank,comm_size;
    MPI_Comm_rank(MPI_COMM_WORLD,&rank);
    MPI_Comm_size(MPI_COMM_WORLD,&comm_size);

    if(comm_size<2) {
        FinalizeMPI();
        return 0;
    }

    MPI_Group GroupAll;
    MPI_Comm_group(MPI_COMM_WORLD, &GroupAll);

    if(rank==0) {
        std::vector<MPI_Group> myGroups(comm_size-1);
        std::vector<MPI_Comm> myComms(comm_size-1);
        for(int k=1;k<comm_size;++k) {
           int ranks[2]{0, k};
           MPI_Group_incl(GroupAll, 2, ranks, &myGroups[k-1]);
           int err = MPI_Comm_create(MPI_COMM_WORLD, myGroups[k-1], &myComms[k-1]);
           std::cout << "Error: " << err << std::endl;
        }
    } else {
        MPI_Group myGroup;
        MPI_Comm myComm;
        int ranks[2]{0,rank};
        MPI_Group_incl(GroupAll, 2, ranks, &myGroup);
        int err = MPI_Comm_create(MPI_COMM_WORLD, myGroup, &myComm);
        std::cout << "Error: " << err << std::endl;
    }
    std::cout << "Communicators created: " << rank << std::endl;
    std::this_thread::sleep_for(std::chrono::milliseconds(1000));

    FinalizeMPI();
    return 0;
}

void FinalizeMPI() {
   int flag;
   MPI_Finalized(&flag);
   if(!flag)
      MPI_Finalize();
}

void InitMPI(int argc, char** argv) {
   int flag;
   MPI_Initialized(&flag);
   if(!flag) {
      int provided_Support;
      MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided_Support);
      if(provided_Support!=MPI_THREAD_MULTIPLE) {
          exit(0);
      }
   }
}

Solution

  • MPI_Comm_create is a collective operation on the initial communicator (MPI_COMM_WORLD) - you must call it on all processes.

    The simplest way to fix the issue is to use MPI_Comm_create_group just the way you do it. It is similar to MPI_Comm_create except that it is collective over the group.