Search code examples
pythonarraysnumpygenfromtxt

Using numpy genfromtxt to read in data in single column to multiple columns using text headers


I am trying to import some data (pressures, stresses) for a set of pre-defined x,y points from a file using genfromtxt. Where the data is just output as a long column split up by a header names, for example:

time
1.0022181

PORE_PRE
-18438721.41
-18438721.41
........

STRS_11
-28438721.41
-28438721.41
........

The time data is only one point, but the PORE_PRE and STRS_11 and other variables contain many but equal numbers of data points. I use the following code:

import numpy as np
import matplotlib.pyplot as plt


file1=open('Z:/EFNHigh_Res/data_tstep1.out','r')
time=np.genfromtxt(file1,names=None,dtype=None,autostrip=True)

With this code I get a structured array with all of the data in one column. I have managed to delete out the time, by deleting the first two rows.

My initial idea was to then reshape the array using information relating to the number of data points which I have found previously and the total number of data points in the column. For example:

xx=np.reshape(time3,307,4)
print xx

However I get the error below, and can't seem to find a way to reshape it, I am guessing it's not possible for some reason, due to the 1D type nature of the array.

 File "Z:\EFNHigh_Res\plotting.py", line 39, in <module>
    xx=np.reshape(time3,307,4)
  File "C:\Python27\ArcGIS10.2\lib\site-packages\numpy\core\fromnumeric.py",line 171, in reshape
    return reshape(newshape, order=order)
ValueError: total size of new array must be unchanged

I don't have much choice over the output format (other than a more complicated arrangement). It seems like it should be a simple operation, but I can't figure it out, but I am very new to python. I have also tried to view only floating point data using the following code, but I get an error as below, or a very large number of data points, greater than those contained within the array.

xx=time3.view(dtype=np.float)
ValueError: new type not compatible with array

Can anyone suggest how I could deal with reading the file in?


Solution

  • You need to read the file in blocks. genfromtxt accepts input from any iterable, a list of strings, a generator, an open file, etc. So you need a script that opens the file, reads the lines of a block, and calls genfromtxt on those, saving the result in a list. At the end you can collect those subarrays into one array.

    https://stackoverflow.com/a/34729730/901925 has a simple example using readlines. Working from a list of lines is the easiest way to develop your ideas - finding boundaries of blocks, etc. You can rework it later into a generator or filter structure if you don't want the full file in memory.

    https://stackoverflow.com/a/35495412/901925 has an extended discussion on merging structured arrays.

    Sample script:

    import numpy as np
    
    lines = open('stack35510689.txt').readlines()
    print lines
    time = float(lines[1].strip())
    print time
    arr1 = np.genfromtxt(lines[3:6], names=True)
    print repr(arr1)
    arr2 = np.genfromtxt(lines[7:10], names=True)
    print repr(arr2)
    
    import numpy.lib.recfunctions as rfn
    print repr(rfn.merge_arrays([arr1,arr2]))
    

    sample source

    time
    1.0022181
    
    PORE_PRE
    -18438721.41
    -18438721.41
    
    STRS_11
    -28438721.41
    -28438721.41
    

    sample output

    1009:~/mypy$ python stack35510689.py
    ['time\n', '1.0022181\n', '\n', 'PORE_PRE\n', '-18438721.41\n', '-18438721.41\n', '\n', 'STRS_11\n', '-28438721.41\n', '-28438721.41\n']
    1.0022181
    array([(-18438721.41,), (-18438721.41,)], 
          dtype=[('PORE_PRE', '<f8')])
    array([(-28438721.41,), (-28438721.41,)], 
          dtype=[('STRS_11', '<f8')])
    array([(-18438721.41, -28438721.41), (-18438721.41, -28438721.41)], 
          dtype=[('PORE_PRE', '<f8'), ('STRS_11', '<f8')])
    

    Reading the same file with one genfromtxt produces a 1d array of strings

    In [819]: data=np.genfromtxt('stack35510689.txt',names=None,dtype=None,autostrip=True)
    In [820]: data
    Out[820]: 
    array(['time', '1.0022181', 'PORE_PRE', '-18438721.41', '-18438721.41',
           'STRS_11', '-28438721.41', '-28438721.41'], 
          dtype='|S12')
    

    If I change the dtype to float I get numbers, with nan where the strings were

    In [821]: data=np.genfromtxt('stack35510689.txt',names=None,dtype=float,autostrip=True)
    
    In [822]: data
    Out[822]: 
    array([             nan,   1.00221810e+00,              nan,
            -1.84387214e+07,  -1.84387214e+07,              nan,
            -2.84387214e+07,  -2.84387214e+07])
    

    I could collect the numbers from that with slicing

    In [826]: np.array([data[3:5],data[6:8]])
    Out[826]: 
    array([[-18438721.41, -18438721.41],
           [-28438721.41, -28438721.41]])
    

    or to make a structured array like before

    In [827]: x=np.zeros((2,),dtype=[('PORE_PRE', '<f8'), ('STRS_11', '<f8')])
    In [828]: x['PORE_PRE']=data[3:5]
    In [829]: x['STRS_11']=data[6:8]
    In [830]: x
    Out[830]: 
    array([(-18438721.41, -28438721.41), (-18438721.41, -28438721.41)], 
          dtype=[('PORE_PRE', '<f8'), ('STRS_11', '<f8')])