Search code examples
mpidistributed-computinghpc

How to do an MPI_Scatter in MPI to all nodes except the root?


In MPI, if I perform an MPI_Scatter on MPI_COMM_WORLD, all the nodes receive some data (including the sending root).

How do I perform an MPI_Scatter from a root node to all the other nodes and make sure the root node does not receive any data?

Is creating a new MPI_Comm containing all the nodes but the root a viable approach?


Solution

  • Let's imagine your code looks like that:

    int rank, size;    // rank of the process and size of the communicator
    int root = 0;      // root process of our scatter
    int recvCount = 4; // or whatever
    double *sendBuf = rank == root ? new double[recvCount * size] : NULL;
    double *recvBuf = new double[recvCount];
    
    MPI_Scatter( sendBuf, recvCount, MPI_DOUBLE,
                 recvBuf, recvCount, MPI_DOUBLE,
                 root, MPI_COMM_WORLD );
    

    So in here, indeed, the root process will send data to itself although this could be avoided. Here are the two obvious methods that come to mind to achieve that.

    Using MPI_IN_PLACE
    The call to MPI_Scatter() wouldn't have to change. The only change in the code would be for the definition of the receiving buffer, which would become something like this:

    double *recvBuf = rank == root ?
                      static_cast<double*>( MPI_IN_PLACE ) :
                      new double[recvCount];
    

    Using MPI_Scatterv()
    With that, you'd have to define an array of integers describing the individual receiving sizes, an array of displacements describing the starting indexes and use them in a call to MPI_Scatterv() which would replace you call to MPI_Scatter() like this:

    int sendCounts[size] = {recvCount};  // everybody receives recvCount data
    sendCounts[root] = 0;                // but the root process
    int displs[size];
    for ( int i = 0; i < size; i++ ) {
        displs[i] = i * recvCount;
    }
    
    MPI_Scatterv( sendBuf, sendCounts, displs, MPI_DOUBLE,
                  recvBuf, recvCount, MPI_DOUBLE,
                  root, MPI_COMM_WORLD );
    

    Of course in both cases no data would be on receiving buffer for process root and this would have to be accounted for in your code.

    I personally prefer the first option, but both work.