c performance parallel-processing mpi hpc

Does MPI support only broadcasting?

What I want to achieve is to broadcast partial result to other threads and receive other threads' result at a different line of code, it can be expressed as the following pseudo code:

if have any incoming message:
    read the message and compare it with the local optimal
    if is optimal:
        update the local optimal

calculate local result
if local result is better than local optimal:
    update local optimal
    send the local optimal to others

The question is, MPI_Bcast/MPI_Ibcast do the send and receive in the same place, what I want is separate send and receive. I wonder if MPI has builtin support for my purpose, or if I can only achieve this by calling MPI_Send/MPI_Isend in a for loop?

Solution

What I want to achieve is to broadcast partial result to other threads and receive other threads' result at a different line of code, it can be expressed as the following pseudo code:

Typically, in MPI and in this context, one tends to use the term process rather than thread:

The question is, MPI_Bcast/MPI_Ibcast do the send and receive in the same place, what I want is separate send and receive.

This is the typical use case for a MPI_Allreduce:

Combines values from all processes and distributes the result back to all processes

So an example that illustrates your pseudo code:

#include <stdio.h>
#include <mpi.h>

int main(int argc,char *argv[]){
    MPI_Init(NULL,NULL); // Initialize the MPI environment
    int world_rank; 
    int world_size;
    MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
    MPI_Comm_size(MPI_COMM_WORLD,&world_size);
    int my_local_optimal = world_rank;
    MPI_Allreduce(&my_local_optimal, &my_local_optimal, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD);
    printf("Step 1 : Process %d -> max local %d \n", world_rank, my_local_optimal);

    my_local_optimal += world_rank * world_size;

    MPI_Allreduce(&my_local_optimal, &my_local_optimal, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD);
    printf("Step 2 : Process %d -> max local %d \n", world_rank, my_local_optimal);


    MPI_Finalize();
    return 0;
 }

So all processes start with a local optimal:

  int my_local_optimal = world_rank;

then they perform a MPI_Allreduce:

MPI_Allreduce(&my_local_optimal, &my_local_optimal, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD);

which will basically get the max value (i.e., MPI_MAX) of the variable my_local_optimal of all processes and stores that value into my_local_optimal.

Conceptually, the difference between the aforementioned approach and:

if have any incoming message:
    read the message and compare it with the local optimal
    if is optimal:
        update the local optimal

is that you neither explicitly check "if have any incoming message:" nor "if is optimal": you just calculate the max among the processes and update the local optimal accordingly. This makes the approach much simpler to handle.

In my example, I have used MPI_MAX, however, you need to use the operation (in your code) that defines what is optimal or not.