Search code examples
cperformancempibroadcastcollect

How to Broadcast elements to only certain ranks without using the MPI_Send and MPI_Recv routines


Just a general question:

I wanted to ask if there is anyway to broadcast elements to only certain ranks in MPI without using the MPI_Send and MPI_Recv routines.


Solution

  • I wanted to ask if there is anyway to broadcast elements to only certain ranks in MPI without using the MPI_Send MPI_Recv.

    Let us start by looking at the description of the MPI_Bcast routine.

    Broadcasts a message from the process with rank "root" to all other processes of the communicator

    The MPI_Bcast broadcast routine is a collective communication. Hence:

    Collective communication is a method of communication which involves participation of all processes in a communicator.

    Notice the text in bold i.e., "all processes in a communicator". Therefore, one approach (to achieve what you want) is to create a subset composed of the processes that will participate in the broadcast routine. This subset can be materialized through the creation of a new MPI communicator. To create that communicator one can use the MPI function MPI_Comm_split. About that routine from source one can read:

    As the name implies, MPI_Comm_split creates new communicators by “splitting” a communicator into a group of sub-communicators based on the input values color and key. It’s important to note here that the original communicator doesn’t go away, but a new communicator is created on each process.
    The first argument, comm, is the communicator that will be used as the basis for the new communicators. This could be MPI_COMM_WORLD, but it could be any other communicator as well.
    The second argument, color, determines to which new communicator each processes will belong. All processes which pass in the same value for color are assigned to the same communicator. If the color is MPI_UNDEFINED, that process won’t be included in any of the new communicators. The third argument, key, determines the ordering (rank) within each new communicator. The process which passes in the smallest value for key will be rank 0, the next smallest will be rank 1, and so on. If there is a tie, the process that had the lower rank in the original communicator will be first. The final argument, newcomm is how MPI returns the new communicator back to the user.

    Let us say that we wanted to have only the processes with an even rank to participate in the MPI_Bcast; We would first create the communicator:

    MPI_Comm new_comm;
    int color = (world_rank % 2  == 0) ? 1 : MPI_UNDEFINED;
    MPI_Comm_split(MPI_COMM_WORLD, color, world_rank, &new_comm);
    

    and eventually call the MPI_Bcast for the new communicator:

        if(world_rank % 2  == 0){
            ....
            MPI_Bcast(&bcast_value, 1, MPI_INT, 0, new_comm);
            ...
       }
    

    At the end we would free the memory used by the communicator:

        MPI_Comm_free(&new_comm);
    

    A running code example:

    #include <assert.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <mpi.h>
    
    int main(int argc,char *argv[]){
        MPI_Init(NULL,NULL); // Initialize the MPI environment
        int world_rank; 
        int world_size;
        MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
        MPI_Comm_size(MPI_COMM_WORLD,&world_size);
        int bcast_value = world_rank;  
        MPI_Bcast(&bcast_value, 1, MPI_INT, 0, MPI_COMM_WORLD);
        printf("MPI_Bcast 1 : MPI_COMM_WORLD ProcessID = %d, bcast_value = %d \n", world_rank, bcast_value);
    
        MPI_Comm new_comm;
        int color = (world_rank % 2  == 0) ? 1 : MPI_UNDEFINED;
        MPI_Comm_split(MPI_COMM_WORLD, color, world_rank, &new_comm); 
       
        if(world_rank % 2  == 0){
        int new_comm_rank, new_comm_size;
        MPI_Comm_rank(new_comm, &new_comm_rank);
            MPI_Comm_size(new_comm, &new_comm_size);
            bcast_value = 1000;
            MPI_Bcast(&bcast_value, 1, MPI_INT, 0, new_comm);
        
        printf("MPI_Bcast 2 : MPI_COMM_WORLD ProcessID = %d, new_comm = %d, bcast_value = %d \n", world_rank, new_comm_rank,  bcast_value);
            MPI_Comm_free(&new_comm);
       }
       MPI_Finalize(); 
       return 0;
     }
    

    This code example, showcases two MPI_Bcast calls, one with all the processes of the MPI_COMM_WORLD (i.e., MPI_Bcast 1) and another with only a subset of those processes (i.e., MPI_Bcast 2).

    The output (for 8 processes):

    MPI_Bcast 1 : MPI_COMM_WORLD ProcessID = 0, bcast_value = 0 
    MPI_Bcast 1 : MPI_COMM_WORLD ProcessID = 4, bcast_value = 0 
    MPI_Bcast 1 : MPI_COMM_WORLD ProcessID = 5, bcast_value = 0 
    MPI_Bcast 1 : MPI_COMM_WORLD ProcessID = 6, bcast_value = 0 
    MPI_Bcast 1 : MPI_COMM_WORLD ProcessID = 7, bcast_value = 0 
    MPI_Bcast 1 : MPI_COMM_WORLD ProcessID = 1, bcast_value = 0 
    MPI_Bcast 1 : MPI_COMM_WORLD ProcessID = 2, bcast_value = 0 
    MPI_Bcast 1 : MPI_COMM_WORLD ProcessID = 3, bcast_value = 0 
    MPI_Bcast 2 : MPI_COMM_WORLD ProcessID = 0, new_comm = 0, bcast_value = 1000 
    MPI_Bcast 2 : MPI_COMM_WORLD ProcessID = 4, new_comm = 2, bcast_value = 1000 
    MPI_Bcast 2 : MPI_COMM_WORLD ProcessID = 2, new_comm = 1, bcast_value = 1000 
    MPI_Bcast 2 : MPI_COMM_WORLD ProcessID = 6, new_comm = 3, bcast_value = 1000