Is MPI_Bcast supposed to work with MPI_IBcast?

As far as I know, you can freely mix blocking and non-blocking MPI operations on both ends of the communication, meaning that an MPI_Send(...) can be received by an MPI_Ircv(...).

That said, I could not use an MPI_Bcast(...) with an MPI_Ibcast(...), as in the example below:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main(void) {
  MPI_Init(NULL, NULL);  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
  MPI_Request req;
  int i;  
  if (world_rank == 0) {
    i = 126;
    MPI_Ibcast(&i, 1, MPI_INT, 0, MPI_COMM_WORLD, &req);
    // do other stuff

    MPI_Wait(&req, MPI_STATUS_IGNORE);

  } else { 
    MPI_Bcast(&i, 1, MPI_INT, 0, MPI_COMM_WORLD);
  }
  
  MPI_Barrier(MPI_COMM_WORLD);
  return 0;
}

Is this supposed to work? I could not find anything related to this information on MPI documentation.

I'm using MPICH 3.3.2 with GCC 10.2.1.

Solution

Simple answer to your question: No, the code is not supposed to work.

Long answer: we cannot match blocking and nonblocking collective calls because,

the collective operations does not have tag argument. The use of tags for collective operations can prevent certain hardware optimizations. So message matching cannot be possible as in point-to-point operations.
Implementation might use different communication algorithms for optimizations in blocking/nonblocking cases. For example, blocking collective operation can be optimised for minimal time to completion.

This is clearly defined in MPI Standard 3.1:

Unlike point-to-point operations, nonblocking collective operations do not match with blocking collective operations, and collective operations do not have a tag argument. All processes must call collective operations (blocking and nonblocking) in the same order per communicator. In particular, once a process calls a collective operation, all other processes in the communicator must eventually call the same collective operation, and no other collective operation with the same communicator in between. This is consistent with the ordering rules for blocking collective operations in threaded environments.

Rationale. Matching blocking and nonblocking collective operations is not allowed because the implementation might use different communication algorithms for the two cases. Blocking collective operations may be optimized for minimal time to completion, while nonblocking collective operations may balance time to completion with CPU overhead and asynchronous progression. The use of tags for collective operations can prevent certain hardware optimizations. (End of rationale.)

Hope this helps!