I am trying to open a text file in python as an array or a list of list. The file looks like below.
Also, here is a link to the text file.
ftp://rammftp.cira.colostate.edu/demaria/ebtrk/ebtrk_atlc.txt
AL0188 ALBERTO 080518 1988 32.0 77.5 20 1015 -99 -99 -99 -99 0 0 0 0 0 0 0 0 0 0 0 0 * 218.
AL0188 ALBERTO 080600 1988 32.8 76.2 20 1014 -99 -99 -99 -99 0 0 0 0 0 0 0 0 0 0 0 0 * 213.
AL0188 ALBERTO 080712 1988 41.5 69.0 35 1002 -99 -99 1012 60 100100 50 50 0 0 0 0 0 0 0 0 * 118.
AL0188 ALBERTO 080718 1988 43.0 67.5 35 1002 -99 -99 1008 50 100100 50 50 0 0 0 0 0 0 0 0 * 144.
AL0188 ALBERTO 080800 1988 45.0 65.5 35 1004 -99 -99 1008 50 -99-99-99-99 0 0 0 0 0 0 0 0 * 22.
AL0188 ALBERTO 080806 1988 47.0 63.0 35 1006 -99 -99 1008 50 -99-99-99-99 0 0 0 0 0 0 0 0 * 64.
I have tried using NumPy genfromtxt but it returned with an error, because it couldn't tell for example that 100100 is two elements in two columns. It treated it as one entry in a column, and so returned error saying the number of columns in each row didn't match.
Is there some way to fix this? Thank you
You can supply the delimiter sizes as argument. Example:
import numpy as np
import sys
with open('ebtrk_atlc.txt', 'rU') as f:
data = np.genfromtxt(f,
dtype=None,
delimiter=[7, 10, 7, 4, 5, 6, 4, 5, 4, 4, 5, 4, 4, 3, 3, 3])
print data
will give as output (omitting the first few lines)
('AL0188 ', 'ALBERTO ', 80712, 1988, 41.5, 69.0, 35, 1002, -99, -99, 1012, 60, 100, 100, 50, 50)
('AL0188 ', 'ALBERTO ', 80718, 1988, 43.0, 67.5, 35, 1002, -99, -99, 1008, 50, 100, 100, 50, 50)
('AL0188 ', 'ALBERTO ', 80800, 1988, 45.0, 65.5, 35, 1004, -99, -99, 1008, 50, -99, -99, -99, -99)
As you see the 100100
field got separated. Of course you have to supply the correct field types and dimensions, this example just demonstrates that it is possible. For example, changing the code to
import numpy as np
import re
import sys
with open('ebtrk_atlc.txt', 'rU') as f:
dt = "a7,a10,a7,i4,f5,f6,i4,i5,i4,i4,i5,i4,i4,i3,i3,i3"
data = np.genfromtxt(f,
dtype=dt,
delimiter=map(int, re.split(",?[a-z]", dt[1:])),
autostrip=True)
will change the result to
('AL0188', 'ALBERTO', '080712', 1988, 41.5, 69.0, 35, 1002, -99, -99, 1012, 60, 100, 100, 50, 50)
('AL0188', 'ALBERTO', '080718', 1988, 43.0, 67.5, 35, 1002, -99, -99, 1008, 50, 100, 100, 50, 50)
('AL0188', 'ALBERTO', '080800', 1988, 45.0, 65.5, 35, 1004, -99, -99, 1008, 50, -99, -99, -99, -99)
Stripping away the whitespace around the strings and explicitly setting some types to float. Further documentation can be found here, check the example at the bottom.