I am trying to parallelize a customer's Fortran code with MPI. f
is an array of 4-byte reals dimensioned f(dimx,dimy,dimz,dimf)
. I need the various processes to work on different parts of the array's first dimension. (I would have rather started with the last, but it wasn't up to me.) So I define a derived type mpi_x_inteface
like so
call mpi_type_vector(dimy*dimz*dimf, 1, dimx, MPI_REAL, &
mpi_x_interface, mpi_err)
call mpi_type_commit(mpi_x_interface, mpi_err)
My intent is that a single mpi_x_interface will contain all of the data in 'f' at some given first index "i". That is, for given i, it should contain f(i,:,:,:)
. (Note that at this stage of the game, all procs have a complete copy of f
. I intend to eventually split f
up between the procs, except I want proc 0 to have a full copy for the purpose of gathering.)
ptsinproc
is an array containing the number of "i" indices handled by each proc. x_slab_displs
is the displacement from the beginning of the array for each proc. For two procs, which is what I am testing on, they are ptsinproc=(/61,60/)
, x_slab_displs=(/0,61/)
. myminpt
is a simple integer giving the minimum index handled in each proc.
So now I want to gather all of f
into proc 0 and I run
if (myrank == 0) then
call mpi_gatherv(MPI_IN_PLACE, ptsinproc(myrank),
+ mpi_x_interface, f(1,1,1,1), ptsinproc,
+ x_slab_displs, mpi_x_interface, 0,
+ mpi_comm_world, mpi_err)
else
call mpi_gatherv(f(myminpt,1,1,1), ptsinproc(myrank),
+ mpi_x_interface, f(1,1,1,1), ptsinproc,
+ x_slab_displs, mpi_x_interface, 0,
+ mpi_comm_world, mpi_err)
endif
I can send at most one "slab" like this. If I try to send the entire 60 "slabs" from proc 1 to proc 0 I get a seg fault due to an "invalid memory reference". BTW, even when I send that single slab, the data winds up in the wrong places.
I've checked all the obvious stuff like maiking sure myrank and ptsinproc and x_slab_dislps are what they should be on all procs. I've looked into the difference between "size" and "extent" and so on, to no avail. I'm at my wit's end. I just don't see what I am doing wrong. And someone might remember that I asked a similar (but different!) question a few months back. I admit I'm just not getting it. Your patience is appreciated.
First off, I just want to say that the reason you're running into so many problems is because you are trying to split up the first (fastest) axis. This is not recommended at all because as-is packing your mpi_x_interface
requires a lot of non-contiguous memory accesses. We're talking a huge loss in performance.
Splitting up the slowest axis across MPI processes is a much better strategy. I would highly recommend transposing your 4D matrix so that the x
axis is last if you can.
Now to your actual problem(s)...
Derived datatypes
As you have deduced, one problem is that the size and extent of your derived datatype might be incorrect. Let's simplify your problem a bit so I can draw a picture. Say dimy*dimz*dimf=3
, and dimx=4
. As-is, your datatype mpi_x_interface
describes the following data in memory:
| X | | | | X | | | | X | | | |
That is, every 4th MPI_REAL
, and 3 of them total. Seeing as this is what you want, so far so good: the size of your variable is correct. However, if you try and send "the next" mpi_x_interface
, you see that your implementation of MPI will start at the next point in memory (which in your case has not been allocated), and throw an "invalid memory access" at you:
tries to access and bombs
vvv
| X | | | | X | | | | X | | | | Y | | | | Y | ...
What you need to tell MPI as part of your datatype is that "the next" mpi_x_interface
starts only 1 real
into the array. This is accomplished by redefining the "extent" of your derived datatype by calling MPI_Type_create_resized()
. In your case, you need to write
integer :: mpi_x_interface, mpi_x_interface_resized
integer, parameter :: SIZEOF_REAL = 4 ! or whatever f actually is
call mpi_type_vector(dimy*dimz*dimf, 1, dimx, MPI_REAL, &
mpi_x_interface, mpi_err)
call mpi_type_create_resized(mpi_x_interface, 0, 1*SIZEOF_REAL, &
mpi_x_interface_resized, mpi_err)
call mpi_type_commit(mpi_x_interface_resized, mpi_err)
Then, calling "the next" 3 mpi_x_interface_resized
will result in:
| X | Y | Z | A | X | Y | Z | A | X | Y | Z | A |
as expected.
MPI_Gatherv
Note that now you have correctly defined the extent of your datatype, calling mpi_gatherv
with an offset in terms of your datatype should now work as expected.
Personally, I wouldn't think there is a need to try some fancy logic with MPI_IN_PLACE
for a collective operation. You can simply set myminpt=1
on myrank==0
. Then you can call on every rank:
call mpi_gatherv(f(myminpt,1,1,1), ptsinproc(myrank),
+ mpi_x_interface_resized, f, ptsinproc,
+ x_slab_displs, mpi_x_interface_resized, 0,
+ mpi_comm_world, mpi_err)