How to read file with mixed data type into a numpy array in Python?
I'm a new python learner. I'm trying to read an existing file with mixed data type into a numpy array.
The content of file data.txt (if comma is not a good symbol, it can be replaced by space):
,'A','B','C','D'
'A', 0, 3, 5, -1
'B', 3, 0, 1, 6
'C', 5, 1, 0, 2
'D', -1, 6, 2, 0
The expected output numpy array is as follows:
array([[None,'A','B','C','D'],
['A', 0, 3, 5, -1 ],
['B', 3, 0, 1, 6 ],
['C', 5, 1, 0, 2 ],
['D', -1, 6, 2, 0 ]])
You could use pandas.read_csv
:
>>> import pandas as pd
>>> df = pd.read_csv('data.txt', index_col=0, sep=',')
>>> print(df)
'A' 'B' 'C' 'D'
'A' 0 3 5 -1
'B' 3 0 1 6
'C' 5 1 0 2
'D' -1 6 2 0
You can then access the underlying array with .values
:
>>> df.values
array([[ 0, 3, 5, -1],
[ 3, 0, 1, 6],
[ 5, 1, 0, 2],
[-1, 6, 2, 0]], dtype=int64)
At least to my knowledge it's not possible to read that file into a plain (not-object) 2D array because a record array requires that any column follows the same types. While it could work for the second-last row (str, int, int, int, int)
it couldn't work for the first row (NoneType, str, str, str, str)
. At least with pandas you can interpret the first row and first column as indices which can have a different type.
However if you don't need the first row and column you could use np.loadtxt
:
>>> import numpy as np
>>> np.loadtxt('myfile.txt', delimiter=',', skiprows=1, usecols=[1,2,3,4], dtype=int)
array([[ 0, 3, 5, -1],
[ 3, 0, 1, 6],
[ 5, 1, 0, 2],
[-1, 6, 2, 0]])