Search code examples
parallel-processingmultiprocessingmpi

Why not use MPI_BYTE always?


If I do not have to care about portability on heterogeneous systems (endianity...):

Why not use MPI_BYTE for all communication?

Especially for collectives and when dealing with composed data types, that would make life much easier.

edit: I just found MPI and C structs . The answers are applicable to my question.


Solution

  • Here's how you send a slice of a 3-dimensional NxNxN array using types:

    double array[N][N][N];
    
     /* ... */
    
    MPI_Datatype xslice, yslice, zslice;
    
    int starts[3]   = {0,N-2,0};
    int sizes[3]    = {N,N,N};
    int subsizes[3] = {N,2,N};
    
    MPI_Type_create_subarray(3, sizes, subsizes, starts, MPI_ORDER_C, MPI_DOUBLE, &yslice);
    MPI_Type_commit(&yslice);
    
    /* ... */
    
    MPI_Send(&(array[0][0][0]), 1, yslice, neigh, ytag, MPI_COMM_WORLD);
    

    What's the easier, typing-less, way to do that using only MPI_BYTE and no other type constructors?

    All of high performance computing ends up coming down to understanding memory and data layout, and using higher-level abstractions helps with that.

    If you are having trouble with MPI_Type_create_struct, you've come to the right place (or one of them). If you've come to find someone to agree with you that yes, learning new stuff is too hard and not worth it, you're probably in the wrong place.

    Edited to add:. I agree that structs are a pain to deal with for serialization - not just MPI - in C and Fortran, for which I blame their inexcusable lack of any sort of even rudimentary introspection. To describe them you have to reiterate their types and counts, which violates the DRY principle. It's a mess all around, and there's probably more than one code out there that just uses sizeof(struct foo) MPI_BYTEs to describe them. But here's a concrete example of where that would fail.

    Now that you're sending and receiving those correctly, you decide to save them to a file, using MPI-IO (or for that matter, HDF5 or NetCDF or...). You describe them using the same method you communicate them, of course, as sizeof(struct foo) bytes.

    C tells you almost nothing about how these structs are laid out, however; the compiler is allowed to do all sorts of things to the layout, in particular inserting padding. This generally isn't a problem for communication if all tasks are running the same code compiled with the same compiler and flags on the same sort of machine.

    But now when you inevitably load that file using the same code but compiled with a different compiler, or even the same compiler but different flags, all bets are off. The data layout may be different, resulting in garbage values - or the amount of padding may be different, causing you to read past the end of the file.

    You could solve this by describing the data differently for file I/O and communications, but now it's hard to argue that this is making things simpler. You're better off just describing the data correctly to begin with.