I'm seeing an MPI_ERR_TRUNCATE
error with boost::mpi
when performing multiple isend/irecv transfers with the same tag using serialized data. These are not concurrent transfers, i.e. no threading is involved. There is just more than one transfer outstanding at the same time. Here's a short test program that exhibits the failure:
#include <iostream>
#include <string>
#include <vector>
#include <boost/mpi.hpp>
#include <boost/serialization/string.hpp>
static const size_t N = 2;
int main() {
boost::mpi::environment env;
boost::mpi::communicator world;
#if 1
// Serialized types fail.
typedef std::string DataType;
#define SEND_VALUE "how now brown cow"
#else
// Native MPI types succeed.
typedef int DataType;
#define SEND_VALUE 42
#endif
DataType out(SEND_VALUE);
std::vector<DataType> in(N);
std::vector<boost::mpi::request> sends;
std::vector<boost::mpi::request> recvs;
sends.reserve(N);
recvs.reserve(N);
std::cout << "Multiple transfers with different tags\n";
sends.clear();
recvs.clear();
for (size_t i = 0; i < N; ++i) {
sends.push_back(world.isend(0, i, out));
recvs.push_back(world.irecv(0, i, in[i]));
}
boost::mpi::wait_all(sends.begin(), sends.end());
boost::mpi::wait_all(recvs.begin(), recvs.end());
std::cout << "Multiple transfers with same tags\n";
sends.clear();
recvs.clear();
for (size_t i = 0; i < N; ++i) {
sends.push_back(world.isend(0, 0, out));
recvs.push_back(world.irecv(0, 0, in[i]));
}
boost::mpi::wait_all(sends.begin(), sends.end());
boost::mpi::wait_all(recvs.begin(), recvs.end());
return 0;
}
In this program I first do 2 transfers on different tags, which works fine. Then I attempt 2 transfers on the same tag, which fails with:
libc++abi.dylib: terminating with uncaught exception of type boost::exception_detail::clone_impl >: MPI_Unpack: MPI_ERR_TRUNCATE: message truncated
If I use a native MPI data type so that serialization is not invoked, things seem to work. I get the same error on MacPorts boost 1.55 with OpenMPI 1.7.3, and Debian boost 1.49 with OpenMPI 1.4.5. I tried multiple transfers with the same tag directly with the API C interface and that appeared to work, though of course I can only transfer native MPI data types.
My question is whether having multiple outstanding transfers on the same tag is a valid operation with boost::mpi
, and if so is there a bug in my program or a bug in boost::mpi
?
At the current version of boost, 1.55, boost::mpi
does not guarantee non-overtaking messages. This in contrast to the underlying MPI API which does:
Order Messages are non-overtaking: If a sender sends two messages in succession to the same destination, and both match the same receive, then this operation cannot receive the second message if the first one is still pending. If a receiver posts two receives in succession, and both match the same message, then the second receive operation cannot be satisfied by this message, if the first one is still pending. This requirement facilitates matching of sends to receives. It guarantees that message-passing code is deterministic, if processes are single-threaded and the wildcard MPI_ANY_SOURCE is not used in receives.
The reason boost::mpi
does not guarantee non-overtaking is that serialized data types are transferred in two MPI messages, one for size and one for payload, and irecv
for the second message cannot be posted until the first message is examined.
A proposal to guarantee non-overtaking in boost::mpi
is being considered. Further discussion can be found on the boost::mpi
mailing list beginning here.