Search code examples
arraysfortranmpifortran90derived-types

Trying to pass MPI derived types between processors (and failing)


I am trying to parallelize a customer's Fortran code with MPI. f is an array of 4-byte reals dimensioned f(dimx,dimy,dimz,dimf). I need the various processes to work on different parts of the array's first dimension. (I would have rather started with the last, but it wasn't up to me.) So I define a derived type mpi_x_inteface like so

call mpi_type_vector(dimy*dimz*dimf, 1, dimx, MPI_REAL,  &
                     mpi_x_interface, mpi_err)
call mpi_type_commit(mpi_x_interface, mpi_err)

My intent is that a single mpi_x_interface will contain all of the data in 'f' at some given first index "i". That is, for given i, it should contain f(i,:,:,:). (Note that at this stage of the game, all procs have a complete copy of f. I intend to eventually split f up between the procs, except I want proc 0 to have a full copy for the purpose of gathering.)

ptsinproc is an array containing the number of "i" indices handled by each proc. x_slab_displs is the displacement from the beginning of the array for each proc. For two procs, which is what I am testing on, they are ptsinproc=(/61,60/), x_slab_displs=(/0,61/). myminpt is a simple integer giving the minimum index handled in each proc.

So now I want to gather all of f into proc 0 and I run

    if (myrank == 0) then
      call mpi_gatherv(MPI_IN_PLACE, ptsinproc(myrank),
 +                     mpi_x_interface, f(1,1,1,1), ptsinproc,
 +                     x_slab_displs, mpi_x_interface, 0,
 +                     mpi_comm_world, mpi_err)
    else
      call mpi_gatherv(f(myminpt,1,1,1), ptsinproc(myrank),
 +                     mpi_x_interface, f(1,1,1,1), ptsinproc,
 +                     x_slab_displs, mpi_x_interface, 0,
 +                     mpi_comm_world, mpi_err)
    endif

I can send at most one "slab" like this. If I try to send the entire 60 "slabs" from proc 1 to proc 0 I get a seg fault due to an "invalid memory reference". BTW, even when I send that single slab, the data winds up in the wrong places.

I've checked all the obvious stuff like maiking sure myrank and ptsinproc and x_slab_dislps are what they should be on all procs. I've looked into the difference between "size" and "extent" and so on, to no avail. I'm at my wit's end. I just don't see what I am doing wrong. And someone might remember that I asked a similar (but different!) question a few months back. I admit I'm just not getting it. Your patience is appreciated.


Solution

  • First off, I just want to say that the reason you're running into so many problems is because you are trying to split up the first (fastest) axis. This is not recommended at all because as-is packing your mpi_x_interface requires a lot of non-contiguous memory accesses. We're talking a huge loss in performance.

    Splitting up the slowest axis across MPI processes is a much better strategy. I would highly recommend transposing your 4D matrix so that the x axis is last if you can.

    Now to your actual problem(s)...

    Derived datatypes

    As you have deduced, one problem is that the size and extent of your derived datatype might be incorrect. Let's simplify your problem a bit so I can draw a picture. Say dimy*dimz*dimf=3, and dimx=4. As-is, your datatype mpi_x_interface describes the following data in memory:

    | X |   |   |   | X |   |   |   | X |   |   |   |
    

    That is, every 4th MPI_REAL, and 3 of them total. Seeing as this is what you want, so far so good: the size of your variable is correct. However, if you try and send "the next" mpi_x_interface, you see that your implementation of MPI will start at the next point in memory (which in your case has not been allocated), and throw an "invalid memory access" at you:

                                                 tries to access and bombs
                                                     vvv
    | X |   |   |   | X |   |   |   | X |   |   |   | Y |   |   |   | Y | ...
    

    What you need to tell MPI as part of your datatype is that "the next" mpi_x_interface starts only 1 real into the array. This is accomplished by redefining the "extent" of your derived datatype by calling MPI_Type_create_resized(). In your case, you need to write

    integer :: mpi_x_interface, mpi_x_interface_resized
    integer, parameter :: SIZEOF_REAL = 4 ! or whatever f actually is
    
    call mpi_type_vector(dimy*dimz*dimf, 1, dimx, MPI_REAL,  &
                     mpi_x_interface, mpi_err)
    call mpi_type_create_resized(mpi_x_interface, 0, 1*SIZEOF_REAL, &
                                 mpi_x_interface_resized, mpi_err)
    call mpi_type_commit(mpi_x_interface_resized, mpi_err)
    

    Then, calling "the next" 3 mpi_x_interface_resized will result in:

    | X | Y | Z | A | X | Y | Z | A | X | Y | Z | A |
    

    as expected.

    MPI_Gatherv

    Note that now you have correctly defined the extent of your datatype, calling mpi_gatherv with an offset in terms of your datatype should now work as expected.

    Personally, I wouldn't think there is a need to try some fancy logic with MPI_IN_PLACE for a collective operation. You can simply set myminpt=1 on myrank==0. Then you can call on every rank:

       call mpi_gatherv(f(myminpt,1,1,1), ptsinproc(myrank),
    +                     mpi_x_interface_resized, f, ptsinproc,
    +                     x_slab_displs, mpi_x_interface_resized, 0,
    +                     mpi_comm_world, mpi_err)