I have data in the file data.dat of the form:
column_1 col col col col col
1 2 3 1 2 3
4 3 2 3 2 4
1 4 3 1 4 3
5 6 4 5 6 4
And I am trying to import using np.genfromtxt, so that all data with column name col is stored in variable y. I tried it using the code:
import numpy as np
data = np.genfromtxt('data.dat', comments='#', delimiter='\t', dtype=None, names=True).transpose()
y = data['col']
But it gives me the following error:
ValueError: two fields with the same name
How can this be solved in Python?
When you use name=True
, np.genfromtxt
returns a structured array. Notice that the columns labelled col
in data.dat
get disambiguated to column names of the form col_n
:
In [114]: arr = np.genfromtxt('data', comments='#', delimiter='\t', dtype=None, names=True)
In [115]: arr
Out[115]:
array([(1, 2, 3, 1, 2, 3), (4, 3, 2, 3, 2, 4), (1, 4, 3, 1, 4, 3),
(5, 6, 4, 5, 6, 4)],
dtype=[('column_1', '<i8'), ('col', '<i8'), ('col_1', '<i8'), ('col_2', '<i8'), ('col_3', '<i8'), ('col_4', '<i8')])
So once you use names=True
it becomes harder to select all the data associated with column name col
. Moreover, the structured array does not allow you to slice multiple columns at one time. So it would be more convenient to instead load the data into an array of homogenous dtype (which is what you would get without names=True
):
with open('data.dat', 'rb') as f:
header = f.readline().strip().split('\t')
arr = np.genfromtxt(f, comments='#', delimiter='\t', dtype=None)
Then you can find the numerical index of those columns whose name is col
:
idx = [i for i, col in enumerate(header) if col=='col']
and select all the data with
y = arr[:, idx]
For example,
import numpy as np
with open('data.dat', 'rb') as f:
header = f.readline().strip().split('\t')
arr = np.genfromtxt(f, comments='#', delimiter='\t', dtype=None)
idx = [i for i, col in enumerate(header) if col=='col']
y = arr[:, idx]
print(y)
yields
[[2 3 1 2 3]
[3 2 3 2 4]
[4 3 1 4 3]
[6 4 5 6 4]]
If you want y
to be 1-dimensional, you could use ravel()
:
print(y.ravel())
yields
[2 3 1 2 3 3 2 3 2 4 4 3 1 4 3 6 4 5 6 4]