I am trying to import some data (pressures, stresses) for a set of pre-defined x,y points from a file using genfromtxt. Where the data is just output as a long column split up by a header names, for example:
time
1.0022181PORE_PRE
-18438721.41
-18438721.41
........STRS_11
-28438721.41
-28438721.41
........
The time data is only one point, but the PORE_PRE and STRS_11 and other variables contain many but equal numbers of data points. I use the following code:
import numpy as np
import matplotlib.pyplot as plt
file1=open('Z:/EFNHigh_Res/data_tstep1.out','r')
time=np.genfromtxt(file1,names=None,dtype=None,autostrip=True)
With this code I get a structured array with all of the data in one column. I have managed to delete out the time, by deleting the first two rows.
My initial idea was to then reshape the array using information relating to the number of data points which I have found previously and the total number of data points in the column. For example:
xx=np.reshape(time3,307,4)
print xx
However I get the error below, and can't seem to find a way to reshape it, I am guessing it's not possible for some reason, due to the 1D type nature of the array.
File "Z:\EFNHigh_Res\plotting.py", line 39, in <module>
xx=np.reshape(time3,307,4)
File "C:\Python27\ArcGIS10.2\lib\site-packages\numpy\core\fromnumeric.py",line 171, in reshape
return reshape(newshape, order=order)
ValueError: total size of new array must be unchanged
I don't have much choice over the output format (other than a more complicated arrangement). It seems like it should be a simple operation, but I can't figure it out, but I am very new to python. I have also tried to view only floating point data using the following code, but I get an error as below, or a very large number of data points, greater than those contained within the array.
xx=time3.view(dtype=np.float)
ValueError: new type not compatible with array
Can anyone suggest how I could deal with reading the file in?
You need to read the file in blocks. genfromtxt
accepts input from any iterable, a list of strings, a generator, an open file, etc. So you need a script that opens the file, reads the lines of a block, and calls genfromtxt
on those, saving the result in a list. At the end you can collect those subarrays into one array.
https://stackoverflow.com/a/34729730/901925 has a simple example using readlines
. Working from a list of lines is the easiest way to develop your ideas - finding boundaries of blocks, etc. You can rework it later into a generator or filter structure if you don't want the full file in memory.
https://stackoverflow.com/a/35495412/901925 has an extended discussion on merging structured arrays.
Sample script:
import numpy as np
lines = open('stack35510689.txt').readlines()
print lines
time = float(lines[1].strip())
print time
arr1 = np.genfromtxt(lines[3:6], names=True)
print repr(arr1)
arr2 = np.genfromtxt(lines[7:10], names=True)
print repr(arr2)
import numpy.lib.recfunctions as rfn
print repr(rfn.merge_arrays([arr1,arr2]))
sample source
time
1.0022181
PORE_PRE
-18438721.41
-18438721.41
STRS_11
-28438721.41
-28438721.41
sample output
1009:~/mypy$ python stack35510689.py
['time\n', '1.0022181\n', '\n', 'PORE_PRE\n', '-18438721.41\n', '-18438721.41\n', '\n', 'STRS_11\n', '-28438721.41\n', '-28438721.41\n']
1.0022181
array([(-18438721.41,), (-18438721.41,)],
dtype=[('PORE_PRE', '<f8')])
array([(-28438721.41,), (-28438721.41,)],
dtype=[('STRS_11', '<f8')])
array([(-18438721.41, -28438721.41), (-18438721.41, -28438721.41)],
dtype=[('PORE_PRE', '<f8'), ('STRS_11', '<f8')])
Reading the same file with one genfromtxt
produces a 1d array of strings
In [819]: data=np.genfromtxt('stack35510689.txt',names=None,dtype=None,autostrip=True)
In [820]: data
Out[820]:
array(['time', '1.0022181', 'PORE_PRE', '-18438721.41', '-18438721.41',
'STRS_11', '-28438721.41', '-28438721.41'],
dtype='|S12')
If I change the dtype to float I get numbers, with nan
where the strings were
In [821]: data=np.genfromtxt('stack35510689.txt',names=None,dtype=float,autostrip=True)
In [822]: data
Out[822]:
array([ nan, 1.00221810e+00, nan,
-1.84387214e+07, -1.84387214e+07, nan,
-2.84387214e+07, -2.84387214e+07])
I could collect the numbers from that with slicing
In [826]: np.array([data[3:5],data[6:8]])
Out[826]:
array([[-18438721.41, -18438721.41],
[-28438721.41, -28438721.41]])
or to make a structured array like before
In [827]: x=np.zeros((2,),dtype=[('PORE_PRE', '<f8'), ('STRS_11', '<f8')])
In [828]: x['PORE_PRE']=data[3:5]
In [829]: x['STRS_11']=data[6:8]
In [830]: x
Out[830]:
array([(-18438721.41, -28438721.41), (-18438721.41, -28438721.41)],
dtype=[('PORE_PRE', '<f8'), ('STRS_11', '<f8')])