Search code examples
carraysmpicontiguousrow-major-order

With MPI are user-defined datatypes useless when there is a contiguous array?


In my program I send to the other processors some rows of a Matrix, I'm coding in C and I know that C is row-major. The matrix is allocated as 1d array.

matrixInArrayB = malloc(height * width * sizeof(int));
matrixB = malloc(height * sizeof(int*));
for (int y = 0; y < height; y++) {
    matrixB[y] = &matrixInArrayB[y * width];
}

I send them in this way

MPI_Isend(&matrixB[0][0], width * height, MPI_INT, dest, tag,
          MPI_COMM_WORLD, &requestesForB[k]);

My doubt concern if I have to use some datatype ad-hoc to ensure the contiguity of the rows, like:

 int MPI_Type_contiguous(int count,
                      MPI_Datatype oldtype,
                      MPI_Datatype *newtype)

Solution

  • No, you don't need to define your own datatypes if you don't want to. However, properly used, they can be very useful.


    Let's say you have the following structure to describe your matrix, rather than an array of pointers to rows of data:

    typedef struct {
        int      rows;
        int      cols;
        ssize_t  rowstep;
        ssize_t  colstep;
        int     *data;
    } matrix;
    

    where data element on row r, column c of matrix m is m.data[r*rowstep + c*colstep]. I've outlined an even better version of this structure in this answer.

    Then, you can use MPI_Type_create_hvector() to create types corresponding to a row, a column, or main diagonal, of this particular type of matrix (specific size and step):

    int  n = (m.rows <= m.cols) ? m.rows : m.cols; /* min(m.rows, m.cols) */
    
    MPI_Type_create_hvector(m.rows, 1, m.rowstep * sizeof m.data[0],
                            MPI_INT, &row_vector_type);
    MPI_Type_create_hvector(m.cols, 1, m.colstep * sizeof m.data[0],
                            MPI_INT, &col_vector_type);
    MPI_Type_create_hvector(n, 1, (m.rowstep + m.colstep) * sizeof m.data[0],
                            MPI_INT, &diag_vector_type);
    

    To refer to row r, you use m.data + r*m.rowstep.
    To refer to column c, you use m.data + c*m.colstep.
    To refer to the main diagonal, you use m.data.
    The size field is always 1, because you send/receive individual rows or columns.

    It is also possible to define datatypes corresponding to any continuous rectangular part of the matrix as well.

    The MPI library will then gather the data on a send, and scatter the data on receive. The actual data items do not need to be consecutive in memory.


    In the above example, one can use the same code to send and receive any row, column, or diagonal vector. Using custom datatypes, one does not need to differentiate between them, aside from defining the types as shown above.

    Simplifying code tends to yield more robust code, with fewer errors. (You could say that bugs are either of the off-by-one type, i.e. hard to spot but not complicated, or of the complicated type, where different aspects of the code interact in an unexpected or inadvertent way to cause the bug.)

    So, I would say user-defined MPI datatypes are not useless even when the data is in a contiguous array, because they can be used to simplify the code, and thus make it more robust and easy to maintain.

    Obviously, not all MPI code uses user-defined MPI datatypes well. Using user-defined datatypes whenever possible is definitely not the solution.

    My point is, you determine the usefulness on a case-by-case basis, depending on whether you can make the code simpler, easier to read, and robust or not.