I'm running into an issue where mpi_gather only returns a small subset of the vector that I'm trying to pass. Note, I'm running this with np 1, but it also happens with np 2 and np 3. NAT = 3 (nat = number of atoms), and there are 194 unique pairs.
To make this happen, I have two derived data types in fortran:
type dtlrdh_lut
sequence
integer p
integer q
integer ind
real(dp), dimension(3, 3) :: TLR
real(dp), dimension(3, 3, 3, 3) :: dTLRdh
end type dtlrdh_lut
In my subroutine, I have defined my variables like so:
type(dtlrdh_lut), dimension(:), allocatable :: my_dtlrdh, collected_dtlrdh
integer :: dh_dtype, dr_dtype, dh_types(5), dr_types(6), dh_blocks(5), dr_blocks(6)
INTEGER(kind=MPI_ADDRESS_KIND) :: dh_offsets(5), dr_offsets(6)
integer :: numtasks, rank, ierr, dh_displs(nproc_image), dr_displs(nproc_image)
integer :: n, status(mpi_status_size)
I split up the work between processes in another method and then count the number of elements of the lookup table need to be computed and allocated the local lookup tables on this specific node like so:
my_num_pairs = 0
do i = 1, num_pairs, 1
if(unique_pairs(i)%cpu.eq.me_image) then
my_num_pairs = my_num_pairs + 1
end if
end do
if(.not.allocated(my_dtlrdh)) allocate(my_dtlrdh(my_num_pairs))
I also allocate and zero out the lookup table that everything will be combined back into with the following code: if(me_image.eq.root_image) then if(.not.allocated(collected_dtlrdh)) allocate(collected_dtlrdh(num_pairs))
do i=1,my_num_pairs,1
collected_dtlrdh(i)%p = 0
collected_dtlrdh(i)%q = 0
collected_dtlrdh(i)%ind = 0
collected_dtlrdh(i)%TLR = 0.0_DP
collected_dtlrdh(i)%dTLRdh = 0.0_DP
end do
end if
I then fill in the lookup table, but I'll skip that code. It's long and not relevant. With this done, it's time to start the MPI process to gather all back on the root process.
call mpi_get_address(my_dtlrdh(1)%p, dh_offsets(1), ierr)
call mpi_get_address(my_dtlrdh(1)%q, dh_offsets(2), ierr)
call mpi_get_address(my_dtlrdh(1)%ind, dh_offsets(3), ierr)
call mpi_get_address(my_dtlrdh(1)%TLR(1,1), dh_offsets(4), ierr)
call mpi_get_address(my_dtlrdh(1)%dTLRdh(1,1,1,1), dh_offsets(5), ierr)
do i = 2, size(dh_offsets)
dh_offsets(i) = dh_offsets(i) - dh_offsets(1)
end do
dh_offsets(1) = 0
dh_types = (/MPI_INTEGER, MPI_INTEGER, MPI_INTEGER, MPI_DOUBLE_PRECISION, MPI_DOUBLE_PRECISION/)
dh_blocks = (/1, 1, 1, 3*3, 3*3*3*3/)
call mpi_type_struct(5, dh_blocks, dh_offsets, dh_types, dh_dtype, ierr)
call mpi_type_commit(dh_dtype, ierr)
I then call gather via:
call mpi_gather(my_dtlrdh, my_num_pairs, dh_dtype, &
collected_dtlrdh, my_num_pairs, dh_dtype, &
root_image, intra_image_comm, ierr)
After I gather, I can then print out what everything should look like:
do i = 1, num_pairs, 1
write(stdout, *) my_dtlrdh(i)%p, collected_dtlrdh(i)%p, my_dtlrdh(i)%q, collected_dtlrdh(i)%q
end do
and this is where I see really strange information. The first few elements that are printed out look fine:
1 1 3 3
1 1 6 6
1 1 9 9
But the tail end of my vector looks like where I only send 174 elements instead of the full 194:
17 0 24 0
18 0 19 0
18 0 20 0
18 0 21 0
18 0 22 0
Given that there are 194 pairs, and both num_pairs and my_num_pairs equal 194, I'm confused. I went ahead and started to play around in desperation, and found that if I use this gather call instead of the one above, I get the full vector:
num_pairs = 2*num_pairs+40
call mpi_gather(my_dtlrdh, num_pairs, dh_dtype, &
collected_dtlrdh, num_pairs, dh_dtype, &
root_image, intra_image_comm, ierr)
which I found by just guess and check. While this may work for this small test system, it hardly seems like a scalable solution. I'm totally at a loss... Any ideas? And please, let me know if you guys need any more information from me.
MPI_TYPE_STRUCT
is deprecated in favour of MPI_TYPE_CREATE_STRUCT
. The latter takes conceptually the same arguments as the former but the array of offsets is of type INTEGER(KIND=MPI_ADDRESS_KIND)
, i.e. the type returned by MPI_GET_ADDRESS
.