I am trying to replicate an old fortran code so I can use it in python. The hardest part is importing the data which is stored in a rather convoluted shape.
I have a text file with 14500 data points stored in 5 columns separated by commas:
9.832e-06, 2.121e-16, 3.862e-21, 2.677e-23, 1.099e-22,
5.314e-23, 1.366e-23, 6.504e-23, 3.936e-23, 1.175e-22,
1.033e-23, 5.262e-23, 3.457e-23, 9.673e-23, 1.903e-22,
3.153e-10, 2.572e-21, 5.350e-23, 4.522e-22, 1.468e-22,
2.195e-23, 2.493e-22, 1.075e-22, 3.525e-22, 1.541e-23,
1.935e-22, 9.220e-23, ..., ..., ...,
The fortran code declares two variables to store these data: pg and ping. The first one is a 4D matrix and the second is a 1D array with the data points array length (14500)
parameter(NSIZ=29)
parameter(NTg=5)
parameter(NDg=5)
parameter(NTAUg=20)
real pg(NSIZ,NDg,NTg,NTAUg)
real ping(NSIZ*NDg*NTg*NTAUg)
So far so good, but now I have an equivalence command:
equivalence(pg,ping)
which as far as I understand it points the pg matrix to the ping array. The final part is a loop which reads the lines from the data file and allocates them to the ping array:
NCOLs=5
NROWS=NTg*NDg*NTAUg*NSIZ / NCOLs
irow=1
do i=1,NROWS
read(nat,*) (ping((irow-1)*NCOLs+k),k=1,NCOLs)
print *, ping(irow)
irow=irow+1
enddo
I would like to replicate this code in python but I do not understand how the read command is assigning the data to the 4D matrix. Could anyone offer any advice?
First, Fortran equivalence
and memory layout. We need to look at memory layout first. I'll describe the 2D case for simplicity. The generalization shouldn't be too hard to figure out for arbitrary dimensionality.
A fortran array is always a contiguous block of memory1. The furthest left index varies the most rapidly (known as Column-major order). So:
a(i, j)
a(i+1, j)
1Nearly always. Array slices and F90+ pointers might make that statement untrue in some circumstances. Even then, those "arrays" are backed by a contiguous block of memory allocated to the program...
are adjacent in memory. Now, assume we have an array declared with length a(N, M)
. From a memory perspective, this is no different than an array a(N*M)
where the mapping of indices is:
memory_index = i*(N-1) + j
a(memory_index)
(the N-1
is because fortran defaults to being 1-based in the indexing).
Ultimately, when you have a 2D array, the compiler translates your statements a(i, j)
to a(i*(N-1) + j)
. It might also do some bounds checking (e.g. to make sure that i
is between 1
and N
, but you usually have to turn that on with a compiler flag because it impacts performance slightly and the Fortran crowd really doesn't like it when their programs slow down ;-).
Ok, where are we? Pretty much everything I've said is that to the compiler, an N dimensional array is really just a 1 dimensional array with some index remapping.
Now we can start to deal with equivalence
. The analogous construct in C would be pointers -- sort of. Your Equivalence statement says that the first element in ping
is the same memory location as the first element in pg
. This allows the user to read the data into ping
which is a flat array, but have it populate the n-dimensional array (since ultimately, they share the same memory locations). It's worth noting that modern Fortran has pointers. They're not exactly like C pointers, but I think that most Fortran enthusiasts would advise you to use them in new code instead of equivalence.
Now, how would I read this data in python? I'd probably do something really simple:
def yield_values(filename):
with open(filename) as f_input:
for line in f_input:
for val in line.split(','):
yield float(val)
Then you can pack it into a numpy array:
import numpy as np
arr = np.array(list(yield_values('data.dat')))
Finally, you can reshape the array:
# Account for python's Row-major ordering.
arr = arr.reshape((NTAUg, NTg, NDg, NSIZ))
If you prefer fortran's column major ordering:
arr = arr.reshape((NSIZ, NDg, NTg, NTAUg), order='F')