Background:
A binary file is read on a Linux machine using the following Fortran code:
parameter(nx=720, ny=360, nday=365)
c
dimension tmax(nx,ny,nday),nmax(nx,ny,nday)
dimension tmin(nx,ny,nday),nmin(nx,ny,nday)
c
open(10,
&file='FILE',
&access='direct',recl=nx*ny*4)
c
do k=1,nday
read(10,rec=(k-1)*4+1)((tmax(i,j,k),i=1,nx),j=1,ny)
read(10,rec=(k-1)*4+2)((nmax(i,j,k),i=1,nx),j=1,ny)
read(10,rec=(k-1)*4+3)((tmin(i,j,k),i=1,nx),j=1,ny)
read(10,rec=(k-1)*4+4)((nmin(i,j,k),i=1,nx),j=1,ny)
end do
File Details:
options little_endian
title global daily analysis (grid box mean, the grid shown is the center of the grid box)
undef -999.0
xdef 720 linear 0.25 0.50
ydef 360 linear -89.75 0.50
zdef 1 linear 1 1
tdef 365 linear 01jan2015 1dy
vars 4
tmax 1 00 daily maximum temperature (C)
nmax 1 00 number of reports for maximum temperature (C)
tmin 1 00 daily minimum temperature (C)
nmin 1 00 number of reports for minimum temperature (C)
ENDVARS
Attempts at a solution:
I am trying to parse this into an array in python using the following code (purposely leaving out two attributes):
with gzip.open("/FILE.gz", "rb") as infile:
data = numpy.frombuffer(infile.read(), dtype=numpy.dtype('<f4'), count = -1)
while x <= len(data) / 4:
tmax.append(data[(x-1)*4])
tmin.append(data[(x-1)*4 + 2])
x += 1
data_full = zip(tmax, tmin)
When testing some records, the data does not seem to line up with some sample records from the file when using Fortran. I have also tried dtype=numpy.float32
as well with no success. It seems as though I am reading the file in correctly in terms of number of observations though. I was also using struct
before I learned the file was created with Fortran. That was not working
There are similar questions out here, some of which have answers that I have tried adapting with no luck.
UPDATE
I am a little bit closer after trying out this code:
#Define numpy variables and empty arrays
nx = 720 #number of lon
ny = 360 #number of lat
nday = 0 #iterate up to 364 (or 365 for leap year)
tmax = numpy.empty([0], dtype='<f', order='F')
tmin = numpy.empty([0], dtype='<f', order='F')
#Parse the data into numpy arrays, shifting records as the date increments
while nday < 365:
tmax = numpy.append(tmax, data[(nx*ny)*nday:(nx*ny)*(nday + 1)].reshape((nx,ny), order='F'))
tmin = numpy.append(tmin, data[(nx*ny)*(nday + 2):(nx*ny)*(nday + 3)].reshape((nx,ny), order='F'))
nday += 1
I get the correct data for the first day, but for the second day I get all zeros, the third day the max is lower than the min, and so on.
After the Update in my question, I realize I had an error with how I was looping. I of course spotted this about 10 minutes after issuing a bounty, aw well.
The error is with using the day to iterate through the records. This will not work as it iterates once per loop, not pushing the records far enough. Hence why some mins were higher than maxes. The new code is:
while nday < 365:
tmax = numpy.append(tmax, data[(nx*ny)*rm:(nx*ny)*(rm + 1)].reshape((nx,ny), order='F'))
rm = rm + 2
tmin = numpy.append(tmin, data[(nx*ny)*rm:(nx*ny)*(rm + 1)].reshape((nx,ny), order='F'))
rm = rm + 2
nday += 1
This used a Record Mover (or rm
as I call it) to move the records the appropriate amount. That was all it needed.