MPI - Difference between mpi_type_get_extent and mpi_type_get_true_extent

I have some problem in understanding the difference between mpi_type_get_extent and mpi_type_get_true_extent. In practice, I was using the former, expecting the results I then obtained with the latter, so I checked the MPI 3.1 Standard, where I found (at the section 4.1.8 True Extent of Datatypes)

However, the datatype extent cannot be used as an estimate of the amount of space that needs to be allocated, if the user has modified the extent

which made me think that I should have experienced no difference in the use of the two subroutines as long as I hadn't modified the extent of the datatype.

But I'm evidently missing something.

Declared the following MPI derived data type,

sizes    = [10,10,10]
subsizes = [ 3, 3, 3]
starts   = [ 2, 2, 2]
CALL MPI_TYPE_CREATE_SUBARRAY(ndims, sizes, subsizes, starts, MPI_ORDER_FORTRAN, MPI_DOUBLE_PRECISION, newtype, ierr)

the following code

call mpi_type_size(newtype, k, ierr)
call mpi_type_get_extent(newtype, lb, extent, ierr)                                                                                             
call mpi_type_get_true_extent(newtype, tlb, textent, ierr)
write(*,*) k/DBS, lb/DBS, extent/DBS, tlb/DBS, textent/DBS ! DBS is the size of double precision

produces the output (obviously the same for all processes)

27   0   1000   222   223

So mpi_type_size behave like I expect, returning PRODUCT(subsizes)*DBS in k; on the other hand, I'd have expected from both mpi_type_get_extent and mpi_type_get_true_extent what only the latter returns (since I have not modified newtype at all), specifically 222 223, which are basically starts(1) + starts(2)*sizes(1) + starts(3)*sizes(1)*sizes(2) and 1 + (subsizes - 1)*[1, sizes(1), sizes(1)*sizes(2)].

Why does mpi_type_get_extent return 0 and PRODUCT(sizes) in lb and extent, regardless of subsizes and starts?

I haven't posted an MWE since I have no errors at all (not at compile time, nor at runtime), I simply haven't got the way the two aforementioned routines work. I would basically like someone to help me in understanding the description of those subroutine in the standard document and why it is correct to obtain those result that I didn't expect.

EDIT As requested by @GillesGouaillardet, I add a "minimal" working example to be run with at least 4 processes (please run it with exactly 4 processes, so that we have the same output), at the end of this question. The last lines can be uncommented (with awareness) to show that the types representing non-contiguous memory location work properly when used with count > 1, once they've been properly resized by means of mpi_type_create_resized. With those lines commented, the program prints size, lb, extent, true_lb, true_extent for all types created (even those intermediate, not committed):

 mpi_type_contiguous                    4                    0                    4                    0                    4
 mpi_type_vector                        4                    0                   13                    0                   13
 mpi_type_vector res                    4                    0                    1                    0                   13
 mpi_type_create_subarray               4                    0                   16                    0                   13
 mpi_type_create_subarray res           4                    0                    1                    0                   13

All types represent one row or column of a 4 by 4 matrix, so their size is predictably always 4; the column type has extent and true_extent both equal to 4 units as well, since it represents four contiguous reals in memory; the type created with mpi_type_vector has extent and true_extent both equal to 13 reals, as I expected (see the nice sketch); if I want to use it with count > 1, I must resize it, changing its extent (and true_extent stays the same); now the hard part comes:

What is that 16 as extent of the type created with mpi_type_create_subarray? To be honest I'd have expected that routine to return an already resized type, ready to be used with count > 1 (i.e. a type with size = 4, extent = 1, true_extent = 13), but it seems it does not: surprisingly for me, extent is 16, which is the size of the global array!

The question is: why? Why the extent of a type created with mpi_type_create_subarray is the product of the elements of the array_of_sizes argument?

program subarray
use mpi
implicit none
integer :: i, j, k, ierr, myid, npro, rs, mycol, myrowugly, myrow_vec, myrow_sub
integer(kind = mpi_address_kind) :: lb, extent, tlb, textent
real, dimension(:,:), allocatable :: mat
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, myid, ierr)
call mpi_comm_size(mpi_comm_world, npro, ierr)
allocate(mat(npro,npro))
mat = myid*1.0
call mpi_type_size(mpi_real, rs, ierr)

call mpi_type_contiguous(npro, mpi_real, mycol, ierr)
call mpi_type_commit(mycol, ierr)
call mpi_type_size(mycol, k, ierr)
call mpi_type_get_extent(mycol, lb, extent, ierr)
call mpi_type_get_true_extent(mycol, tlb, textent, ierr)
if (myid == 0) print *, 'mpi_type_contiguous         ', k/rs, lb/rs, extent/rs, tlb/rs, textent/rs

call mpi_type_vector(npro, 1, npro, mpi_real, myrowugly, ierr)
call mpi_type_size(myrowugly, k, ierr)
call mpi_type_get_extent(myrowugly, lb, extent, ierr)
call mpi_type_get_true_extent(myrowugly, tlb, textent, ierr)
if (myid == 0) print *, 'mpi_type_vector             ', k/rs, lb/rs, extent/rs, tlb/rs, textent/rs
call mpi_type_create_resized(myrowugly, int(0, mpi_address_kind)*rs, int(1, mpi_address_kind)*rs, myrow_vec, ierr)
call mpi_type_commit(myrow_vec, ierr)
call mpi_type_size(myrow_vec, k, ierr)
call mpi_type_get_extent(myrow_vec, lb, extent, ierr)
call mpi_type_get_true_extent(myrow_vec, tlb, textent, ierr)
if (myid == 0) print *, 'mpi_type_vector res         ', k/rs, lb/rs, extent/rs, tlb/rs, textent/rs

call mpi_type_create_subarray(2, [npro, npro], [1, npro], [0, 0], mpi_order_fortran, mpi_real, myrowugly, ierr)
call mpi_type_size(myrowugly, k, ierr)
call mpi_type_get_extent(myrowugly, lb, extent, ierr)
call mpi_type_get_true_extent(myrowugly, tlb, textent, ierr)
if (myid == 0) print *, 'mpi_type_create_subarray    ', k/rs, lb/rs, extent/rs, tlb/rs, textent/rs

call mpi_type_create_resized(myrowugly, int(0, mpi_address_kind)*rs, int(1, mpi_address_kind)*rs, myrow_sub, ierr)
call mpi_type_commit(myrow_sub, ierr)
call mpi_type_size(myrow_sub, k, ierr)
call mpi_type_get_extent(myrow_sub, lb, extent, ierr)
call mpi_type_get_true_extent(myrow_sub, tlb, textent, ierr)
if (myid == 0) print *, 'mpi_type_create_subarray res', k/rs, lb/rs, extent/rs, tlb/rs, textent/rs

!if (myid == 0) call mpi_send(mat(1,1), 2, mycol, 1, 666, mpi_comm_world, ierr)
!if (myid == 0) call mpi_recv(mat(1,3), 2, mycol, 1, 666, mpi_comm_world, mpi_status_ignore, ierr)
!if (myid == 1) call mpi_recv(mat(1,1), 2, mycol, 0, 666, mpi_comm_world, mpi_status_ignore, ierr)
!if (myid == 1) call mpi_send(mat(1,3), 2, mycol, 0, 666, mpi_comm_world, ierr)
!if (myid == 0) call mpi_send(mat(1,1), 2, myrow_vec, 1, 666, mpi_comm_world, ierr)
!if (myid == 0) call mpi_recv(mat(3,1), 2, myrow_vec, 1, 666, mpi_comm_world, mpi_status_ignore, ierr)
!if (myid == 1) call mpi_recv(mat(1,1), 2, myrow_vec, 0, 666, mpi_comm_world, mpi_status_ignore, ierr)
!if (myid == 1) call mpi_send(mat(3,1), 2, myrow_vec, 0, 666, mpi_comm_world, ierr)
!if (myid == 0) call mpi_send(mat(1,1), 2, myrow_sub, 1, 666, mpi_comm_world, ierr)
!if (myid == 0) call mpi_recv(mat(3,1), 2, myrow_sub, 1, 666, mpi_comm_world, mpi_status_ignore, ierr)
!if (myid == 1) call mpi_recv(mat(1,1), 2, myrow_sub, 0, 666, mpi_comm_world, mpi_status_ignore, ierr)
!if (myid == 1) call mpi_send(mat(3,1), 2, myrow_sub, 0, 666, mpi_comm_world, ierr)
!do i = 0, npro
!if (myid == i) then
!print *, ""
!print *, myid
!do j = 1, npro
!print *, mat(j,:)
!end do
!end if
!call mpi_barrier(mpi_comm_world, ierr)
!end do

call mpi_finalize(ierr)
end program subarray

Solution

MPI_Type_create_subarray() creates a derived datatype whose extent is, per definition, the product of all sizes.

The definition is in the MPI 3.1 standard at page 96.

MPI_Type_create_subarray() is generally used for MPI-IO, so this definition of the extent makes sense there.

It might not be what you wish in this very specific case, but think of a 2x2 subarray of a 4x4 array. What extent would you expect ?