Have a following data set that has different column lengths:
5.0 0.4 0.92 11.45 44.18 33.66
3.2 4.92 7.2 11.73 46.98 118.63
3.6 11.43 14.32 8.88 71.3
1.99 9.12 11.71 15.56 20.24
0.77 21.92 2.47 33.99 80.68
0.91 4.32 14.6 15.69 127.8
2.67 2.1 5.14 7.96 46.88
0.76 0.44 5.46 71.13 16.62
3.52 1.15 6.21 31.84 10.33
0.93 2.29 0.83 58.0 18.32
0.56 1.61 5.09 20.07 10.1
0.02 1.23 5.95 16.24
1.5 3.23 4.21 18.9
I have tried using genfromtxt, but it is returning only the first row in the data.
data = np.genfromtxt(filename,dtype=float,usecols=range(6))
Is there an argument that I am missing that I am not aware of that can fix this? If I don't use the usecols argument, the data is returned as one column instead. Setting delimiter='' returned the same result. Ideally I would like to read in the data and then separate it for each column.
Numpy array must be regular, so genfromtxt
is not done for that. for
such data pandas
is probably easier to use , filling missing values with NaN by default :
In [7]: df.pd.read_csv('file.txt',sep=' *',engine='python',header=None)
Out[7]:
0 1 2 3 4 5
0 5.00 0.40 0.92 11.45 44.18 33.66
1 3.20 4.92 7.20 11.73 46.98 118.63
2 3.60 11.43 14.32 8.88 71.30 NaN
3 1.99 9.12 11.71 15.56 20.24 NaN
4 0.77 21.92 2.47 33.99 80.68 NaN
5 0.91 4.32 14.60 15.69 127.80 NaN
6 2.67 2.10 5.14 7.96 46.88 NaN
7 0.76 0.44 5.46 71.13 16.62 NaN
8 3.52 1.15 6.21 31.84 10.33 NaN
9 0.93 2.29 0.83 58.00 18.32 NaN
10 0.56 1.61 5.09 20.07 10.10 NaN
11 0.02 1.23 5.95 16.24 NaN NaN
12 1.50 3.23 4.21 18.90 NaN NaN
You come back to numpy arrays with df.values
.