What I want to achieve is to broadcast partial result to other threads and receive other threads' result at a different line of code, it can be expressed as the following pseudo code:
if have any incoming message:
read the message and compare it with the local optimal
if is optimal:
update the local optimal
calculate local result
if local result is better than local optimal:
update local optimal
send the local optimal to others
The question is, MPI_Bcast/MPI_Ibcast do the send and receive in the same place, what I want is separate send and receive. I wonder if MPI has builtin support for my purpose, or if I can only achieve this by calling MPI_Send/MPI_Isend in a for loop?
What I want to achieve is to broadcast partial result to other threads and receive other threads' result at a different line of code, it can be expressed as the following pseudo code:
Typically, in MPI and in this context, one tends to use the term process rather than thread:
The question is, MPI_Bcast/MPI_Ibcast do the send and receive in the same place, what I want is separate send and receive.
This is the typical use case for a MPI_Allreduce:
Combines values from all processes and distributes the result back to all processes
So an example that illustrates your pseudo code:
#include <stdio.h>
#include <mpi.h>
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int world_rank;
int world_size;
MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
MPI_Comm_size(MPI_COMM_WORLD,&world_size);
int my_local_optimal = world_rank;
MPI_Allreduce(&my_local_optimal, &my_local_optimal, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD);
printf("Step 1 : Process %d -> max local %d \n", world_rank, my_local_optimal);
my_local_optimal += world_rank * world_size;
MPI_Allreduce(&my_local_optimal, &my_local_optimal, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD);
printf("Step 2 : Process %d -> max local %d \n", world_rank, my_local_optimal);
MPI_Finalize();
return 0;
}
So all processes start with a local optimal:
int my_local_optimal = world_rank;
then they perform a MPI_Allreduce
:
MPI_Allreduce(&my_local_optimal, &my_local_optimal, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD);
which will basically get the max value (i.e., MPI_MAX
) of the variable my_local_optimal
of all processes and stores that value into my_local_optimal
.
Conceptually, the difference between the aforementioned approach and:
if have any incoming message:
read the message and compare it with the local optimal
if is optimal:
update the local optimal
is that you neither explicitly check "if have any incoming message:"
nor "if is optimal":
you just calculate the max among the processes and update the local optimal accordingly. This makes the approach much simpler to handle.
In my example, I have used MPI_MAX
, however, you need to use the operation (in your code) that defines what is optimal or not.