Search code examples
mpimpich

Why such an MPI communication example can run?


This is my MPI program. On lines 29 and 30 of this program, I send two messages with tags 98 and 99, respectively, but in lines 33 and 34 I want the other program to receive messages with a tag of 99 first and then the program with a tag of 98. In my understanding, because MPI_Send() and MPI_Recv() are blocking functions, these four-line programs would fall into a situation of waiting for each other, but this does not happen in actual operation. During debugging, the MPI_Send() function is also not blocked. Why is that?

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define N 1

int main(int argc, char **argv) {

#ifdef DEBUG
int i = 0;
while (0 == i) {
sleep(1);
}
#endif

    int myrank, dest;
    int my_int[N], get_int[N];
    
    MPI_Status status;
    
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    my_int[0] = myrank;
    get_int[0] = myrank + 1;
    dest = (myrank == 0) ? 1 : 0;
    
    if (myrank == 0) {
        MPI_Send(my_int, N, MPI_INT, dest, 98, MPI_COMM_WORLD);
        MPI_Send(get_int, N, MPI_INT, dest, 99, MPI_COMM_WORLD);
    } else {
        printf("myrank: %d my_int = %d get_int = %d.\n", myrank, my_int[0],
               get_int[0]);
        MPI_Recv(get_int, N, MPI_INT, 0, 99, MPI_COMM_WORLD, &status);
        MPI_Recv(my_int, N, MPI_INT, 0, 98, MPI_COMM_WORLD, &status);
        printf("myrank: %d my_int = %d get_int = %d.\n", myrank, my_int[0],
               get_int[0]);
    }
    
    MPI_Finalize();
    return 0;

}
mpirun -n 2 ./debug.out
myrank: 1 my_int = 1 get_int = 2.
myrank: 1 my_int = 0 get_int = 1.

I think this program opens two processes that will go into a deadlocked state, but they don't.


Solution

  • There are three different protocols for sending messages in MPI. There are other subtypes and characteristics of these protocols based on the actual implementation.

    In general, they are short, eager and rendezvous protocol.

    1. In short protocol the data is send along with the envelope (MPI messages consist of envelope and data. Envelope consists of info about tag, communicators etc.) to the reciver and stored in a preallocated buffer at the receiver.

    2. In eager protocol, the message is send with the assumption that the receiving process has buffers allocated and the destination can store the message. This requires buffering at the receiver. This means that the sender will complete the MPI_Send() call as message is sent and but not yet received in the receiver side.

    3. In rendezvous protocol, the message is not send until the destination and sender has negotiated (matching receive has posted yet).

    As @Gilles Gouaillardet mentioned in the comment, In your case eager protocol is likely to be used and there won't (assumption) be a deadlock. If you increase your message size iteratively, you will be able to see that the processes will deadlock. In some MPI implementations, you can set this eager limit yourself and play around with it (eg: btl_vader_eager_limit parameter in openMPI).

    In other words, MPI_Send may or may not block. This is implementation specific. It will block until the sender can reuse the sender buffer. Some implementations will return to the caller when the buffer has been sent to a lower communication layer. Some others will return to the caller when there's a matching MPI_Recv() at the other end. So it's up to your MPI implementation whether this program will deadlock or not and your responsibility to avoid deadlocks :). See this SO thread.