I am making a large block of data in each process and I need to share these data between the processes by MPI broad casting,
How can I minimise the cost or is there any algorithm for broadcasting the massive data between processes by MPI?
How can I minimise the cost or is there any algorithm for broadcasting the massive data between processes by MPI?
Bear in mind that MPI_Bcast:
During a broadcast, one process sends the same data to all processes in a communicator.
If that is what you want you will have to rely on the underling hardware and on that the implementation of the MPI standard that you are using implements the MPI_Bcast routine efficiently. It might even happen that (depending upon your implementation) MPI_Reduce is actually faster than MPI_Bcast. Nevertheless, in some implementations, for instance Open MPI, you can further tune the algorithm used by the MPI_Bcast
using the flag
--mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_bcast_algorithm 4
Another option is to try and use the non-blocking version of the MPI broadcast, namely MPI_Ibcast:
Broadcasts a message from the process with rank "root" to all other processes of the communicator in a nonblocking way
You can try to overlap computation with communication. Notwithstanding, that computation should not modify the buffer used by the MPI routine (more information on why can be seen here).