Search code examples
arraysfortranstructuredynamictypeallocatable-array

Designing a derived type with array components


I have struggled to find any concrete information where designing a derived type is concerned. I think the best way to discuss this is through a couple of options. I have made up some sections of code with different applications of the derived type. I would prefer to use dynamic arrays for nparts, index, and refs. I have omitted sections of code that actually uses the structure (there isn't any because I made it up), but an examples are shown and in a routine I intend to use all of the values in the structure at least once.

Option A: Use static arrays in the derived type. Downside is that I would have to guess the array size at compile time.

! Known before compile time.
nboxes = 5000
max_parts = 2000
packs = 10

Type Boxes
   Sequence
   Integer :: location, date
   Integer, Dimension(0:packs) :: nparts
   Integer, Dimension(max_parts,packs) :: index
   Real(Kind=8), Dimension(packs,packs) :: refs
End Type Boxes

type(boxes), dimension(:), allocatable :: assembly
allocate(assembly(nboxes))

! Perform some operations on assembly...
do i = 1,nboxes
   do j = 1,packs
      do k = j,packs
         example = assembly(i)%nparts(k) - assembly(i)%nparts(j)
         .
         .
         do m = 1,max_parts
            example = assembly(i)%index(m,j) + assembly(i)%refs(k,j) * assembly(i)%nparts(j)
            .
            .
         end do
      end do
   end do
end do

Option B: Use dynamic arrays in the derived type.

! Defined during execution. Much better.
nboxes = 5000
max_parts = 2000
packs = 10

Type Boxes
   Sequence
   Integer :: location, date
   Integer, Dimension(:), Allocatable :: nparts
   Integer, Dimension(:,:), Allocatable :: index
   Real(Kind=8), Dimension(:,:), Allocatable :: refs
End Type Boxes

type(boxes), dimension(:), allocatable :: assembly
allocate(assembly(nboxes))
do i = 1,nboxes
   allocate(assembly(i)%nparts(0:packs))
   allocate(assembly(i)%index(max_parts,packs))
   allocate(assembly(i)%refs(packs,packs))
end do

! Perform some operations on assembly...
do i = 1,nboxes
   do j = 1,packs
      do k = j,packs
         example = assembly(i)%nparts(k) - assembly(i)%nparts(j)
         .
         .
         do m = 1,max_parts
            example = assembly(i)%index(m,j) + assembly(i)%refs(k,j) * assembly(i)%nparts(j)
            .
            .
         end do
      end do
   end do
end do

Option C: Minimize the number of dynamic arrays used in the derived type and force assembly to become the array. Notice though that in this version, we have a bunch of unused memory. For example, nparts and index need memory packs-times since assembly(packs,packs,nboxes).

! Defined during execution. Much better.
nboxes = 5000
max_parts = 2000
packs = 10

Type Boxes
   Sequence
   Integer :: location, date, nparts, index
   Real(Kind=8) :: refs
   Integer, Dimension(:), Allocatable :: index
End Type Boxes

type(boxes), dimension(:,:,:), allocatable :: assembly
allocate(assembly(packs,packs,nboxes))
do i = 1,nboxes
   do j = 1,packs
      do k = 1,packs
         allocate(assembly(k,j,i)%index(max_parts))
      end do
   end do
end do

! Perform some operations on assembly...
do i = 1,nboxes
   do j = 1,packs
      do k = j,packs
         example = assembly(k,j,i)%nparts - assembly(k,j,i)%nparts
         .
         do m = 1,max_parts
            example = assembly(k,j,i)%index(m) + assembly(k,j,i)%refs * assembly(k,j,i)%nparts
            .
            .
         end do
      end do
   end do
end do

Option D: Another permutation of Option C.

Questions:

  1. Which version is the correct/expected method of designing a derived type for the do loop example shown? Which version is most optimized considering that I would like dynamic array capabilities?
  2. Maybe related to above. How is the memory allocated and accessed? Is the use of SEQUENCE even worthwhile? I think allocated arrays wouldn't show up in sequence anyways. Wouldn't this point to Option C being the best since each section of assembly is smaller?
  3. Should I maybe split this derived type into multiple derived types or get rid of it altogether and just stick to variables? I will be using this derived type over multiple routines and would put it in a module.

Solution

    1. You want the fastest varying index to be your innermost loop. The fastest varying index is the first in a multi-dimensional array. Thus, option B comes close to this goal. Though you might want to change the ordering of the dimensions in refs.

    2. The memory layout for a two-dimensional array of shape (m,n), which is accessed by the indices (i,j) is given by the following ordering: k = i+m*(j-1), where k denotes the one-dimensional index in memory. The derived data-type would contain a reference to the allocated memory, and the actual memory of the contained allocatables may be scattered across memory, but each allocatable array is contiguous in itself. So, in your option B the assembly would be a contiguous array containing references to allocatable arrays. Each of nparts, index and refs would be contiguous arrays in themselves but the may be located at arbitrary places with no specific relation within one assembly element or across different assembly elements. Use of SEQUENCE does not make any sense here, it forces the compiler to put the elements of the derived data type into the memory in the order, that you state and will prohibit it to re-arrange the data-type components as it sees fit, which may limit the performance. I doubt, it would have large effects in your example, but when it is not needed you should leave it.

    3. No, option B looks perfectly reasonable, in my opinion (except for the sequence).