Search code examples
oopvectorfortransimulationparticles

Behavior of components when structures are in an array


I am currently working on the simulation of a physical system in Fortran90 with something like 50 millions particles. Each has a position x (to simplify).

For now, I am using a 1D vector that contains the position of each particle. And when I have to iterate on every particle, I just go through that vector (as I took care to sort the particles to limit cache misses).

I am now considering creating a particle class. But what about the access to its position as I iterate ? Will it be as fast as the previous case ?

So, what does the compiler do to store the attributes of an object ? And a fortiori, what about the case with more than one attributes?

Thank you for your time.


Solution

  • On "how are derived types stored":

    Fortran Standard requires components of a sequence type to be stored (in memory) as a sequence of contiguous storage, in components' declaration order. Sequence types are those declared with a SEQUENCE statement, which implies that the type shall have at least one component, each component shall be of an intrinsic or sequence type, shall not be a parameterized or extensible type, and can't have type-bound procedures. If you want this behavior and your type is suitable, make it a sequence type (you may take data alignment into consideration).

    On the other hand, Fortran Standard does not state how compilers have to organize storage for non-sequence derived types. That's not bad at all, as compilers are free to optimize storage. Most of times, you may expect almost the same as sequence types: things stored contiguously whenever posible (padding may apply). Arrays and strings are always contiguous. Pointer and allocatable components are a only reference, for obvious reasons, and their targets lay somewhere else.

    From the Standard:

    A structure resolves into a sequence of components. Unless the structure includes a SEQUENCE statement, the use of this terminology in no way implies that these components are stored in this, or any other, order. Nor is there any requirement that contiguous storage be used. The sequence merely refers to the fact that in writing the definitions there will necessarily be an order in which the components appear, and this will define a sequence of components. This order is of limited significance because a component of an object of derived type will always be accessed by a component name except in the following contexts: the sequence of expressions in a derived-type value constructor, intrinsic assignment, the data values in namelist input data, and the inclusion of the structure in an input/output list of a formatted data transfer, where it is expanded to this sequence of components. Provided the processor adheres to the defined order in these cases, it is otherwise free to organize the storage of the components for any nonsequence structure in memory as best suited to the particular architecture.


    On "Is it faster to have a derived type than independent arrays":

    As @VladmirF said in comment, its a broad topic, depends highly on how are you accessing and operating your data, and has been asked and answered before (Check links on its comment). You may find a lot about it arround (link1, link2) and I'll add this one on "cache blocking" thay may interest you.