c++performance parallel-processing mpi nonblocking

Simple hello world non blocking MPI

I am practicing a simple non-blocking "Hello world" program with this website.

#include <iostream>
#include <mpi.h>
#include <unistd.h>

int main(int argc, char* argv[])
{
    MPI_Init(&argc, &argv);
    MPI_Request request;
    MPI_Status  status;

    int size, rank, data;

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    MPI_Comm_size(MPI_COMM_WORLD, &size);

    if (rank>0) {
    MPI_Irecv(&data, 1, MPI_INT, rank - 1, 0,  MPI_COMM_WORLD,&request);

      std::cout << "Rank " << rank << " has received message with data " << data<< " from rank " << rank - 1
              << std::endl;
    }

    std::cout << "Hello from rank " <<rank << " out of " << size<< std::endl;

    data=rank;

    MPI_Isend(&data, 1, MPI_INT, (rank + 1) % size, 0, MPI_COMM_WORLD, &request);

    MPI_Finalize();
    return 0;
}

I have a couple of problems: the first one is (rank + 1) % size does not make sense to me. I expect this to be just rank+1 rahther than (rank + 1) % size. But, when I delete %size, the code doese not run. The second ambiguty I have is the result of this particular code which is :

#PTP job_id=12493
Rank 3 has received message with data 21848 from rank 2
Hello from rank 3 out of 4
Hello from rank 0 out of 4
Rank 2 has received message with data 22065 from rank 1
Hello from rank 2 out of 4
Rank 1 has received message with data 22043 from rank 0
Hello from rank 1 out of 4

I have defined data to be equal to rank, but it seems it throws something random. Why is this?

Solution

TL;DR The main issue in your code (and likely) the reason data displays random values is the use of MPI_Irecv and MPI_Isend without the call of MPI_Wait (or MPI_Test).

MPI_Irecv and MPI_Isend are nonblocking communication routines, therefore one needs to use the MPI_Wait (or use MPI_Test to test for the completion of the request) to ensure that the message is completed, and that the data in the send/receive buffer can be again safely manipulated.

Let us imagine that you send an array of ints using MPI_Isend without calling MPI_Wait; in such case you are not really sure when you can safely modify (or deallocate the memory of) that array. The same applies to MPI_Irecv. Nonetheless, calling MPI_Wait ensures that from that point onwards one case read/write (or deallocate the memory of) the buffer without risks of undefined behavior or inconsistent data.

During the MPI_Isend the content of the buffer (e.g., the array of ints) has to be read and sent; like-wise during the MPI_Irecv the content of the receiving buffer has to arrive. In the meantime, one can overlap some computation with the ongoing process, however this computation cannot change (or read) the contend of the send/recv buffer. Then one calls the MPI_Wait to ensure that from that point onwards the data send/recv can be safely read/modified without any issues.

In your code, however you call:

MPI_Irecv(&data, 1, MPI_INT, rank - 1, 0,  MPI_COMM_WORLD,&request);

without calling MPI_Wait afterwards. Moreover, you change the content of the buffer i.e., data=rank;. As previously mention this can lead to undefined behavior.

You can fix this problem by either using MPI_Recv and MPI_Send instead or by calling MPI_Irecv and MPI_Isend followed by MPI_Wait. Semantically a call to MPI_Isend() followed by a call to MPI_Wait() or a call to MPI_Recv followed by MPI_Wait() is the same as calling MPI_Send() and MPI_Recv(), respectively.

I have a couple of problems: the first one is (rank + 1) % size does not make sense to me. I expect this to be just rank+1 rahther than (rank + 1) % size.

To explain you the reasoning behind that expression let us think about your code with 4 processes, with ranks ranging from 0 to 3.

Process 1, 2 and 3 will call :

MPI_Irecv(&data, 1, MPI_INT, rank - 1, 0,  MPI_COMM_WORLD,&request);

Process 1 expects a message from process 0;
Process 2 expects a message from process 1;
Process 3 expects a message form process 2;

then all four processes call (let us replace the formula (rank + 1) % size, accordingly):

MPI_Isend(&data, 1, MPI_INT, (rank + 1) % size, 0, MPI_COMM_WORLD, &request);

Process 0 sends a message from process (0 + 1) % 4 -> 1;
Process 1 sends a message from process (1 + 1) % 4 -> 2;
Process 2 sends a message from process (2 + 1) % 4 -> 3;
Process 3 sends a message from process (3 + 1) % 4 -> 0;

So (rank + 1) % size is used as a trick so that when you reach the last rank it returns back the rank of the first.

All this to build the following pattern of recv/send message 0 -> 1 -> 2 -> 3 -> 0.

As you might have noticed MPI_Recv and MPI_Send are called 3 and 4 times, respectively. And even though you did not experienced any issue regarding this, commonly, not performing the same number of MPI_Recv/MPI_Send calls might cause deadlocks. This can be avoid if instead of using (rank + 1) % size you use just rank + 1 and filter out the last processes from calling the MPI_Isend, as follows:

if(rank + 1 < size)
  MPI_Isend(&data, 1, MPI_INT, (rank + 1) 0, MPI_COMM_WORLD, &request);

This means that Process 3 will not send a message to Process 0, but that is okey since Process 0 does not expect to receive any message anyway.

A Running Example:

#include <iostream>
#include <mpi.h>
#include <unistd.h>

int main(int argc, char* argv[])
{
    MPI_Init(&argc, &argv);
    MPI_Request request;
    MPI_Status  status;

    int size, rank, data;

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    if (rank>0) {
       MPI_Irecv(&data, 1, MPI_INT, rank - 1, 0,  MPI_COMM_WORLD,&request);
       MPI_Wait(&request, &status);
       std::cout << "Rank " << rank << " has received message with data " << data<< " from rank " << rank - 1
              << std::endl;
    }

    std::cout << "Hello from rank " <<rank << " out of " << size<< std::endl;
    data=rank;

   if(rank + 1 < size){
       MPI_Isend(&data, 1, MPI_INT, (rank + 1) 0, MPI_COMM_WORLD, &request);
    }
    MPI_Wait(&request, &status);
    MPI_Finalize();

    return 0;
}

Output: With 4 process a possible output:

Hello from rank 0 out of 4
Rank 1 has received message with data 0 from rank 0
Hello from rank 1 out of 4
Rank 2 has received message with data 1 from rank 1
Hello from rank 2 out of 4
Rank 3 has received message with data 2 from rank 2
Hello from rank 3 out of 4