Search code examples
cperformanceparallel-processingmpihpc

Does MPI support only broadcasting?


What I want to achieve is to broadcast partial result to other threads and receive other threads' result at a different line of code, it can be expressed as the following pseudo code:

if have any incoming message:
    read the message and compare it with the local optimal
    if is optimal:
        update the local optimal

calculate local result
if local result is better than local optimal:
    update local optimal
    send the local optimal to others

The question is, MPI_Bcast/MPI_Ibcast do the send and receive in the same place, what I want is separate send and receive. I wonder if MPI has builtin support for my purpose, or if I can only achieve this by calling MPI_Send/MPI_Isend in a for loop?


Solution

  • What I want to achieve is to broadcast partial result to other threads and receive other threads' result at a different line of code, it can be expressed as the following pseudo code:

    Typically, in MPI and in this context, one tends to use the term process rather than thread:

    The question is, MPI_Bcast/MPI_Ibcast do the send and receive in the same place, what I want is separate send and receive.

    This is the typical use case for a MPI_Allreduce:

    Combines values from all processes and distributes the result back to all processes

    So an example that illustrates your pseudo code:

    #include <stdio.h>
    #include <mpi.h>
    
    int main(int argc,char *argv[]){
        MPI_Init(NULL,NULL); // Initialize the MPI environment
        int world_rank; 
        int world_size;
        MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
        MPI_Comm_size(MPI_COMM_WORLD,&world_size);
        int my_local_optimal = world_rank;
        MPI_Allreduce(&my_local_optimal, &my_local_optimal, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD);
        printf("Step 1 : Process %d -> max local %d \n", world_rank, my_local_optimal);
    
        my_local_optimal += world_rank * world_size;
    
        MPI_Allreduce(&my_local_optimal, &my_local_optimal, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD);
        printf("Step 2 : Process %d -> max local %d \n", world_rank, my_local_optimal);
    
    
        MPI_Finalize();
        return 0;
     }
    

    So all processes start with a local optimal:

      int my_local_optimal = world_rank;
    

    then they perform a MPI_Allreduce:

    MPI_Allreduce(&my_local_optimal, &my_local_optimal, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD);
    

    which will basically get the max value (i.e., MPI_MAX) of the variable my_local_optimal of all processes and stores that value into my_local_optimal.

    Conceptually, the difference between the aforementioned approach and:

    if have any incoming message:
        read the message and compare it with the local optimal
        if is optimal:
            update the local optimal
    

    is that you neither explicitly check "if have any incoming message:" nor "if is optimal": you just calculate the max among the processes and update the local optimal accordingly. This makes the approach much simpler to handle.

    In my example, I have used MPI_MAX, however, you need to use the operation (in your code) that defines what is optimal or not.