What is the most efficient way to pass a large array to a subroutine in Fortran?

I have a physics program written in Fortran that does calculations on a large 3D domain (say with dimension AxBxC). I have various large arrays for my variables that I do operations on. These arrays are defined initially as allocatable arrays because their size is given by user input. I currently compile this code using Intel compilers but would also like a solution that is efficient on gfortran. The code is used on large (thousands of processors) parallel computing facilities.

Sketching out the code,

program hello
implicit none

real, allocatable, dimension(:,:,:)    ::    big_array
integer      ::      A, B, C

! Does some calculations and defines various things based on user input

allocate(big_array(1:A,1:B,1:C)) 

call do_something_to_big_array(big_array)

end program hello

My question is about what is the most efficient way to pass big_array to the subroutine do_something_to_big_array. At the moment, it is done with

subroutine do_something_to_big_array(big_array)
implicit none
real, dimension(1:A, 1:B, 1:C)     ::    big_array

! Does some calculations that alter big_array

end subroutine

But I was wondering whether there is a better-practice way of doing this that is more efficient? These subroutines are called many thousands of times as the code iterates in time.

Similarly, I have some subroutines that use arrays that are only used in local scope but are the size of the entire domain. I've found that defining these as part of the subroutine types significantly increases my memory usage, is there a better way to do this:

subroutine do_something_else(big_array)
implicit none

real, dimension(1:A,1:B,1:C)    ::    big_array
real, dimension(1:A,1:B,1:C)    ::    temp_array ! Array only used in this subroutine for certain calculations

end subroutine

Again, these subroutines are called many times so would it be more efficient to define the arrays in global scope and then just use them there?

Solution

(answer edited after some changes in the question)

How do you get the A, B, and C values in the subroutine? They are not passed in the arguments, and they are not global variables in what you have posted.

Basically there are 3 ways to declare the dummy array argument in the subroutine:

Assumed size: real, dimension(A,B,*) :: big_array. The last size in unspecified
Explicit shape: real, dimension(A,B,C) :: big_array. This is what you are doing
Assumed shape: real, dimension(:,:,:) :: big_array. the sizes are passed by Fortran as hidden arguments. You can inquire then afterwards, e.g. C = size(big_array,3)

Assumed sized is a legacy practice, which is generally discouraged nowadays. Assumed shape requires an interface (either an explicit one, or the routine must be either "contained" or in a module). Assumed shape may also has some impact on the performances, because the dummy array is not guaranteed to be contiguous. There you can help the compiler by explicitly specifying it is: real, contiguous, dimension(:,:,:) :: big_array

Performance wise, if contiguous is specified, all the methods above behave more or less the same.

And since your initial array is allocatable, there's a fourth way:

real, allocatable, dimension(:,:,:) :: big_array (that is, the declaration in the routine is exactly the same as in the main program). You don't need to specify contiguous, as allocatable implies contiguity.

Regarding your tmp_array, there's no unique answer: it all depends on the amount of computations you are doing in the routine, which may or may not hide the allocation overhead.