network-programming mpi cluster-computing distributed-computing

Sending partial MPI messages

To avoid allocating an intermediary buffer, it makes sense in my application that my MPI_Recv receives one single big array, but on the sending side, the data is non-contiguous, and I'd like it to make the data available to the network interface as soon as it is possible to organize it. Something like this:

MPI_Request reqs[N];
for(/* each one of my N chunks */) {
    partial_send(chunk, &reqs[chunk->idx]);
}

MPI_Waitall(N, reqs, MPI_STATUSES_IGNORE);

Or even better for me, do like in POSIX's writev function:

/* Precalculate this. */
struct iovec iov[N];
for(/* each one of my N chunks */) {
    iov[chunk->idx].iov_base = chunk->ptr;
    iov[chunk->idx].iov_len = chunk->len;
}

/* Done every time I need to send. */
MPI_Request req;
chunked_send(iov, &req);
MPI_Wait(req, MPI_STATUS_IGNORE);

Is such a thing possible in MPI?

Solution

I'd like to simply comment but can't as I am new to stack overflow and don't have sufficient reputation ...

If all your chunks are aligned on regular boundaries (e.g. they're pointers into some larger contiguous array) then you should use MPI_Type_indexed where the displacements and counts are all measured in multiples of the basic type (here it's MPI_DOUBLE I guess). However, if the chunks have, for example, been individually malloc'd and there's no guarantee of alignment then you'll need to use a more general MPI_Type_create_struct which specifies displacements in bytes (and also allows a different type for each block which you don't require).

I was worried that you might have to do some sorting to ensure that you scan linearly through memory so the displacements never go backwards (i.e. they are "monotonically nondecreasing"). However, I believe this is only a constraint if you are going to use the types for file IO with MPI-IO rather than for point-to-point send/recv.