Search code examples
pythonarraysnumpyreadfilemixed

How to read file with mixed data type into a numpy array in Python?


How to read file with mixed data type into a numpy array in Python?

I'm a new python learner. I'm trying to read an existing file with mixed data type into a numpy array.

The content of file data.txt (if comma is not a good symbol, it can be replaced by space):

   ,'A','B','C','D'
'A',  0,  3,  5, -1
'B',  3,  0,  1,  6
'C',  5,  1,  0,  2
'D', -1,  6,  2,  0

The expected output numpy array is as follows:

array([[None,'A','B','C','D'],
       ['A',  0,  3,  5, -1 ],
       ['B',  3,  0,  1,  6 ],
       ['C',  5,  1,  0,  2 ],
       ['D', -1,  6,  2,  0 ]])

Solution

  • You could use pandas.read_csv:

    >>> import pandas as pd
    
    >>> df = pd.read_csv('data.txt', index_col=0, sep=',')
    >>> print(df)
         'A'  'B'  'C'  'D'
    
    'A'    0    3    5   -1
    'B'    3    0    1    6
    'C'    5    1    0    2
    'D'   -1    6    2    0
    

    You can then access the underlying array with .values:

    >>> df.values
    array([[ 0,  3,  5, -1],
           [ 3,  0,  1,  6],
           [ 5,  1,  0,  2],
           [-1,  6,  2,  0]], dtype=int64)
    

    At least to my knowledge it's not possible to read that file into a plain (not-object) 2D array because a record array requires that any column follows the same types. While it could work for the second-last row (str, int, int, int, int) it couldn't work for the first row (NoneType, str, str, str, str). At least with pandas you can interpret the first row and first column as indices which can have a different type.

    However if you don't need the first row and column you could use np.loadtxt:

    >>> import numpy as np
    
    >>> np.loadtxt('myfile.txt', delimiter=',', skiprows=1, usecols=[1,2,3,4], dtype=int)
    array([[ 0,  3,  5, -1],
           [ 3,  0,  1,  6],
           [ 5,  1,  0,  2],
           [-1,  6,  2,  0]])