Search code examples
pythonnumpyiogenfromtxt

genfromtxt dtype=None not getting a 2d-array


I'm using this line to read a file

data_train = np.genfromtxt(filename, delimiter=' ', autostrip=True, dtype=float, missing_values="", filling_values='0')

since the values of a column may not share the same type, I'm getting a one dimensional array. Same if I use dtype=None). However, the values are either integer, or float, or missing values.

Can I fix this and get a 2d-array?

For example:

1, 2, 3, 4, 3.3, , 2.2, 1  
1.1, 2.2, 4, , , , ,

Solution

  • You can use:

    np.nan_to_num(np.genfromtxt('test.txt', delimiter=','))
    

    where np.nan_to_num() converts the nan entries that will be created where you have the missing data by 0, obtaining for your example:

    array([[ 1. ,  2. ,  3. ,  4. ,  3.3,  0. ,  2.2,  1. ],
           [ 1.1,  2.2,  4. ,  0. ,  0. ,  0. ,  0. ,  0. ]])
    

    EDIT: as clarified by @unutbu, @Warren Weckesser and in the dicussion below, depending on your system you can simply do (for me, on Windows 7 64 bit, Python 2.7.8 64 bit and NumPy 1.9.0 downloaded here it doesn't work):

    np.genfromtxt('test.txt', filling_values=0, delimiter=',')