I put this code only as an example so that you can understand what I am looking for:
double *f = malloc(sizeof(double) * nx * ny);
double *f2 = malloc(sizeof(double) * nx * ny);
for ( i = process * (nx/totalProcesses); i < (process + 1) * (nx/totalProcesses); i++ )
{
for ( j = 0; j < ny; j++ )
{
f2[i*ny + j] = j*i;
}
}
MPI_Allreduce( f2, f, nx*ny, MPI_DOUBLE, MPI_SUM, MPI_COMM);
And yes, it works, in the end I have the correct result in 'f' and that is what I want, but I would like to know if there is a better or more direct way to achieve the same in order to get efficiency. I tried it with allgather but couldn't get correct result.
but I would like to know if there is a better or more direct way to achieve the same in order to get efficiency.
No, in the given context, using a MPI collective routine is (in theory) always more efficient than the alternative send/recv. Although is not imposed by the MPI standard a good implementation of it, however, implements MPI collective routines like MPI_Allreduce
in log(p)
steps (with p
being the number of process).
Bear in mind, however, that MPI_Allreduce:
Combines values from all processes and distributes the result back to all processes.
Therefore, if you do need the result in all the processes you can use MPI_Reduce:
Reduces values on all processes to a single value