As an example, lets say I have
int a = ...;
int b = ...;
int c;
where a
is the result of some complex local calculation and b
is some metric for the quality of a
.
I'd like to send the best value of a
to every process and store it in c
where best is defined by having the largest value of b
.
I guess I'm just wondering if there is a more efficient way of doing this than doing an allgather
on a
and b
and then searching through the resulting arrays.
The actual code involves sending and comparing several hundred values on upto several hundred/thousand processes, so any efficiency gains would be welcome.
I guess I'm just wondering if there is a more efficient way of doing this than doing an allgather on a and b and then searching through the resulting arrays.
This can be achieved with only a single MPI_AllReduce.
I will present two approaches, a simpler one (suitable for your use case); and a more generic one, for more complex use-cases. The latter will also be useful to show case MPI functionality such as custom MPI Datatypes and custom MPI reduction operators.
Approach 1
To represent
int a = ...;
int b = ...;
you could use the following struct:
typedef struct MyStruct {
int b;
int a;
} S;
then you can use the MPI Datatype MPI_2INT and the MPI operator MAXLOC
The operator MPI_MINLOC is used to compute a global minimum and also an index attached to the minimum value. **MPI_MAXLOC similarly computes a global maximum and index. One application of these is to compute a global minimum (maximum) and the rank of the process containing this value.
In your case, instead of the rank
we will be using the value of 'a'. Hence, the MPI_AllReduce
call:
S local, global;
...
MPI_Allreduce(&local, &global, 1, MPI_2INT, MPI_MAXLOC, MPI_COMM_WORLD);
The complete code would look like the following:
#include <stdio.h>
#include <mpi.h>
typedef struct MyStruct {
int b;
int a;
} S;
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int world_rank;
int world_size;
MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
MPI_Comm_size(MPI_COMM_WORLD,&world_size);
// Some fake data
S local, global;
local.a = world_rank;
local.b = world_size - world_rank;
MPI_Allreduce(&local, &global, 1, MPI_2INT, MPI_MAXLOC, MPI_COMM_WORLD);
if(world_rank == 0){
printf("%d %d\n", global.b, global.a);
}
MPI_Finalize();
return 0;
}
Second Approach
The MPI_MAXLOC
only works for a certain number of predefined datatypes. Nonetheless, for the remaining cases you can use the following approach (based on this SO thread):
struct
that will contain the values a
and b
;MPI_Datatype
representing the 1. struct
to be sent across processes;MPI_AllReduce
:int MPI_Allreduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)
Combines values from all processes and distributes the result back to all processes
MAX
;I'd like to send the best value of 'a' to every process and store it in 'c' where best is defined by having the largest value of 'b'.
b
of the struct. Hence, you need to create a custom MPI_Op
max operation.Coding the approach
So let us break step-by-step the aforementioned implementation:
First define the struct
:
typedef struct MyStruct {
double a, b;
} S;
Second create the customize MPI_Datatype
:
void defineStruct(MPI_Datatype *tstype) {
const int count = 2;
int blocklens[count];
MPI_Datatype types[count];
MPI_Aint disps[count];
for (int i=0; i < count; i++){
types[i] = MPI_DOUBLE;
blocklens[i] = 1;
}
disps[0] = offsetof(S,a);
disps[1] = offsetof(S,b);
MPI_Type_create_struct(count, blocklens, disps, types, tstype);
MPI_Type_commit(tstype);
}
Very Important
Note that since we are using a struct
you have to be careful with the fact that (source)
the C standard allows arbitrary padding between the fields.
So reducing a struct
with two double
s is NOT the same as reducing an array with two double
s.
In the main you have to do:
MPI_Datatype structtype;
defineStruct(&structtype);
Third create the custom max operation:
void max_struct(void *in, void *inout, int *len, MPI_Datatype *type){
S *invals = in;
S *inoutvals = inout;
for (int i=0; i < *len; i++)
inoutvals[i].b = (inoutvals[i].b > invals[i].b) ? inoutvals[i].b : invals[i].b;
}
in the main do:
MPI_Op maxstruct;
MPI_Op_create(max_struct, 1, &maxstruct);
Finally, call the MPI_AllReduce
:
S local, global;
...
MPI_Allreduce(&local, &global, 1, structtype, maxstruct, MPI_COMM_WORLD);
The entire code put together:
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
typedef struct MyStruct {
double a, b;
} S;
void max_struct(void *in, void *inout, int *len, MPI_Datatype *type){
S *invals = in;
S *inoutvals = inout;
for (int i=0; i<*len; i++)
inoutvals[i].b = (inoutvals[i].b > invals[i].b) ? inoutvals[i].b : invals[i].b;
}
void defineStruct(MPI_Datatype *tstype) {
const int count = 2;
int blocklens[count];
MPI_Datatype types[count];
MPI_Aint disps[count];
for (int i=0; i < count; i++) {
types[i] = MPI_DOUBLE;
blocklens[i] = 1;
}
disps[0] = offsetof(S,a);
disps[1] = offsetof(S,b);
MPI_Type_create_struct(count, blocklens, disps, types, tstype);
MPI_Type_commit(tstype);
}
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int world_rank;
int world_size;
MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
MPI_Comm_size(MPI_COMM_WORLD,&world_size);
MPI_Datatype structtype;
MPI_Op maxstruct;
S local, global;
defineStruct(&structtype);
MPI_Op_create(max_struct, 1, &maxstruct);
// Just some random values
local.a = world_rank;
local.b = world_size - world_rank;
MPI_Allreduce(&local, &global, 1, structtype, maxstruct, MPI_COMM_WORLD);
if(world_rank == 0){
double c = global.a;
printf("%f %f\n", global.b, c);
}
MPI_Finalize();
return 0;
}