Search code examples
c++mpims-mpi

Deadlock in Microsft MPI MPI_Isend


I have the following c/c++ code with Microsoft MPI

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main (int argc, char *argv[])
 {
  int  err, numtasks, taskid;
  int out=0,val;
  MPI_Status status;
  MPI_Request req;

  err=MPI_Init(&argc, &argv);
  err=MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
  err=MPI_Comm_rank(MPI_COMM_WORLD, &taskid);


  int receiver=(taskid+1)% numtasks;
  int sender= (taskid-1+numtasks)% numtasks;
  printf("sender %d, receiver %d, rank %d\n",sender,receiver, taskid);

  val=50;   
  MPI_Isend(&val, 1, MPI_INT, receiver, 1, MPI_COMM_WORLD, &req);
  MPI_Irecv(&out, 1, MPI_INT, sender, 1, MPI_COMM_WORLD, &req);
  printf ("Rank: %d , Value: %d\n", taskid, out );
  err=MPI_Finalize();
  return 0;
 }

The application goes in deadlock if launched with more than 2 processes. With 2 processes the application works but no write on "out" is performed. This code works with a linux mpi distribution, the problem seems to be only in the microsoft version. Any help ?


Solution

  • Firstly, each MPI process is performing two communications: a single send and a single receive. So you need to have storage for two requests (MPI_Request req[2]) and two status checks (MPI_Status status[2]).

    Secondly, you need to wait after you call the non-blocking send/recvs to ensure that they finish properly.

    MPI_Isend(&val, 1, MPI_INT, receiver, 1, MPI_COMM_WORLD, &req[0]);
    MPI_Irecv(&out, 1, MPI_INT, sender, 1, MPI_COMM_WORLD, &req[1]);
    
    // While the communication is happening, here you can overlap computation
    // on data that is NOT being currently communicated, with the communication of val/out
    
    MPI_Waitall(2, req, status);
    
    // Now both the send and receive have been finished for this process, 
    // and we can access out, assured that it is valid
    
    printf ("Rank: %d , Value: %d\n", taskid, out);
    

    As to why this worked on a Linux distribution, and not the Microsoft one... I can only assume that under-the-hood the Linux implementation is effectively implementing non-blocking communication as blocking communication. That is, they're "cheating" and finishing your communication for you before it should be completed. This makes it easier on them because they wouldn't have to track as much information about the communication, but it also ruins your ability to overlap computation and communication. You should NOT rely on this to work.