Search code examples
performancemultidimensional-arrayfortrandynamic-memory-allocationsubroutine

Fortran - Function vs subroutine performance


Since a couple of years ago or so I was completely new to Fortran, I overused SUBROUTINEs with no arguments, together with shared data, so that these procedures have made calculations on the actual arguments, available through USE statements. Now that I need to reuse some of these procedures (think of computing the divergence in a volume, a big DIMENSION(:,:,:) array, from a vector field in that volume, 3 big DIMENSION(:,:,:) arrays glued together in a derived type), I'd like either to

  • keep them SUBROUTINEs but remove USE statements and use IN/OUT/INOUT dummy arguments (easy), or
  • transform them in FUNCTIONs (little harder since I've to study a bit)

I imagine there could be a difference in performance in the two approaches, that I'd like to understand. In the following MWE I wrote 3 procedures to do the same computation, but I have no idea of how I should choose one or the others; neither I know whether some other approach would be preferable.

As a note, all rank-3 actual arrays in my program are ALLOCATABLE and must be so.

PROGRAM mymod

    IMPLICIT NONE

    TYPE blk3d
        REAL,    DIMENSION(:,:,:), ALLOCATABLE :: values
    END TYPE blk3d
    TYPE(blk3d) :: A, B

    INTEGER, PARAMETER :: n = 2
    INTEGER :: i

    ALLOCATE(A%values(n,n,n))
    A%values = RESHAPE([(i/2.0, i = 1, PRODUCT(SHAPE(A%values)))], SHAPE(A%values))
    print *, A%values

    ! 1st way
    B = myfun(A)
    print *, B%values

    DEALLOCATE(B%values)

    ! 2nd way
    ALLOCATE(B%values(n,n,n))
    CALL mysub(A, B)
    print *, B%values

    DEALLOCATE(B%values)

    ! 3rd way
    ALLOCATE(B%values(n,n,n))
    CALL mysub2(A, B%values)
    print *, B%values

CONTAINS

  FUNCTION myfun(Adummy) RESULT(Bdummy)                                                                                                              
    IMPLICIT NONE

    TYPE(blk3d), INTENT(IN) :: Adummy
    TYPE(blk3d)             :: Bdummy

    ALLOCATE(Bdummy%values, mold = Adummy%values)
    Bdummy%values(:,:,:) = 2*Adummy%values

  END FUNCTION myfun

  SUBROUTINE mysub(Adummy, Bdummy)

    IMPLICIT NONE

    TYPE(blk3d), INTENT(IN)    :: Adummy
    TYPE(blk3d), INTENT(INOUT) :: Bdummy

    Bdummy%values(:,:,:) = 2*Adummy%values

  END SUBROUTINE mysub

  SUBROUTINE mysub2(Adummy, Bdummy)

    IMPLICIT NONE

    TYPE(blk3d),            INTENT(IN)  :: Adummy
    REAL, DIMENSION(:,:,:), INTENT(OUT) :: Bdummy

    Bdummy(:,:,:) = 2*Adummy%values

  END SUBROUTINE mysub2

END PROGRAM mymod

EDIT In my program to do CFD, I use several rank-3 big-sized arrays. These arrays interact with each other in that computations (not simply pointwise +/-/*,...) is performed on some of them to obtain others of them. Think of B which is computed based on A through one of the 4 procedures in the example and then used to upgrade A itself by A = A + B. Am I wrong to think that the 4 options above could accomplish the same task? In this sense I could simply call A = A + myfun(A) if I chose the function approach.

EDIT2 The actual derived type, along with that big rank-3 array, has half a dozen of other fields (scalars, and small static arrays); additionally, most of the variables of this type are arrays, e.g. TYPE(blk3d), DIMENSION(n) :: A.


Solution

  • Choosing between subroutine or function should generally be based on how you will be using the result and how clear it is to the reader.

    What you do want to be concerned with is how many times data gets unnecessarily copied. With big arrays, you would want to reduce that.

    Ignoring the actual work in the procedures, myfun copies the data a second time and (potentially) does two allocations. First the function result variable gets allocated and the data copied to it. Then back in the caller, B%values gets reallocated if it isn't already the same shape as the result and the data copied again, then the function result is deallocated.

    mysub and mysub2 don't have this extra allocate/copy and are pretty much equivalent, though the call to mysub2 may have some extra work setting up a descriptor on the stack. I expect that to be noise if the subroutine does any real work.

    Choosing between mysub and mysub2 really depends on how clear it is in your real application. Having a derived type with only one component seems unrealistic unless you are looking to have an array of these.