c++parallel-processing mpi cluster-computing hpc

Barrier after MPI non-blocking call, without bookkeeping?

I'm doing a bunch of MPI_Iallreduce non-blocking communications. I've added these Iallreduce calls to several different places in my code. Every so often, I want to pause and wait for all the Iallreduce calls to finish.

Version 1 with MPI_Request bookkeeping -- this works:

MPI_Request requests[];
MPI_Iallreduce(..., requests[0]);
...
MPI_Iallreduce(..., requests[n-1]);
for(int i=0; i<n; i++){
    MPI_Wait(requests[i], ...);
}

But, I'm working in a pretty big codebase, and I'd rather not write the extra code to keep track of all these MPI_Request objects. I'd like to do the following:

Version 2 without MPI_Request bookkeeping -- this segfaults:

MPI_Iallreduce(..., requests[0]);
...
MPI_Iallreduce(..., requests[n-1]);
MPI_Barrier(...); //wait for Iallreduces to finish, without MPI_Request bookkeeping

But, the MPI_Barrier version segfaults.

Is there a way to do a bunch of non-blocking MPI calls, and then wait for the calls to finish, without keeping track of MPI_Request objects?

Solution

It depends on how specifically you don't want to "track request objects". Generally, nothing will guarantee that the calls are done other than just waiting for the requests. However, the way you're doing it isn't the simplest way. Instead, use MPI_WAITALL.

MPI_Iallreduce(..., requests[0]);
...
MPI_Iallreduce(..., requests[n-1]);
MPI_Waitall(n, requests, MPI_STATUSES_IGNORE);

This will wait for all of the requests to complete at once and when you're done, you know all of your reductions are finished. If you want to get more fine grained information about how they went, you can replace MPI_STATUSES_IGNORE with an array of MPI_STATUS objects.