Search code examples
pythonpython-2.7numpygenfromtxt

Opening text file in python as an array or list of list


I am trying to open a text file in python as an array or a list of list. The file looks like below.
Also, here is a link to the text file.
ftp://rammftp.cira.colostate.edu/demaria/ebtrk/ebtrk_atlc.txt

AL0188 ALBERTO   080518 1988 32.0  77.5  20 1015 -99 -99  -99 -99   0  0  0  0   0  0  0  0   0  0  0  0 *   218.  
AL0188 ALBERTO   080600 1988 32.8  76.2  20 1014 -99 -99  -99 -99   0  0  0  0   0  0  0  0   0  0  0  0 *   213.  
AL0188 ALBERTO   080712 1988 41.5  69.0  35 1002 -99 -99 1012  60 100100 50 50   0  0  0  0   0  0  0  0 *   118.  
AL0188 ALBERTO   080718 1988 43.0  67.5  35 1002 -99 -99 1008  50 100100 50 50   0  0  0  0   0  0  0  0 *   144.  
AL0188 ALBERTO   080800 1988 45.0  65.5  35 1004 -99 -99 1008  50 -99-99-99-99   0  0  0  0   0  0  0  0 *    22.  
AL0188 ALBERTO   080806 1988 47.0  63.0  35 1006 -99 -99 1008  50 -99-99-99-99   0  0  0  0   0  0  0  0 *    64.  

I have tried using NumPy genfromtxt but it returned with an error, because it couldn't tell for example that 100100 is two elements in two columns. It treated it as one entry in a column, and so returned error saying the number of columns in each row didn't match.

Is there some way to fix this? Thank you


Solution

  • You can supply the delimiter sizes as argument. Example:

    import numpy as np
    import sys
    
    with open('ebtrk_atlc.txt', 'rU') as f:
        data = np.genfromtxt(f,
                             dtype=None,
                             delimiter=[7, 10, 7, 4, 5, 6, 4, 5, 4, 4, 5, 4, 4, 3, 3, 3])
        print data
    

    will give as output (omitting the first few lines)

    ('AL0188 ', 'ALBERTO   ', 80712, 1988, 41.5, 69.0, 35, 1002, -99, -99, 1012, 60, 100, 100, 50, 50)
    ('AL0188 ', 'ALBERTO   ', 80718, 1988, 43.0, 67.5, 35, 1002, -99, -99, 1008, 50, 100, 100, 50, 50)
    ('AL0188 ', 'ALBERTO   ', 80800, 1988, 45.0, 65.5, 35, 1004, -99, -99, 1008, 50, -99, -99, -99, -99)
    

    As you see the 100100 field got separated. Of course you have to supply the correct field types and dimensions, this example just demonstrates that it is possible. For example, changing the code to

    import numpy as np
    import re
    import sys
    
    with open('ebtrk_atlc.txt', 'rU') as f:
        dt = "a7,a10,a7,i4,f5,f6,i4,i5,i4,i4,i5,i4,i4,i3,i3,i3"
        data = np.genfromtxt(f,
                             dtype=dt,
                             delimiter=map(int, re.split(",?[a-z]", dt[1:])),
                             autostrip=True)
    

    will change the result to

    ('AL0188', 'ALBERTO', '080712', 1988, 41.5, 69.0, 35, 1002, -99, -99, 1012, 60, 100, 100, 50, 50)
    ('AL0188', 'ALBERTO', '080718', 1988, 43.0, 67.5, 35, 1002, -99, -99, 1008, 50, 100, 100, 50, 50)
    ('AL0188', 'ALBERTO', '080800', 1988, 45.0, 65.5, 35, 1004, -99, -99, 1008, 50, -99, -99, -99, -99)
    

    Stripping away the whitespace around the strings and explicitly setting some types to float. Further documentation can be found here, check the example at the bottom.