Search code examples
python-2.7csvmatrixgenfromtxt

Cannot use column slicing (correctly) in a matrix with data read from a CSV in Python


I am trying to read a CSV file (containing one column of strings and one of integers) into a matrix using genfromtxt and then use slicing to get only the column containing the string values and load it into an array for further processing.

CSV File:

explore,1043
 sky,   585
 nikon, 552
 2007,  552
 ....  

I use genfromtxt to load the csv:

my_data = np.genfromtxt('c:/tags.csv', delimiter=',')

and when I try to slice the matrix in order to get column containing the strings only:

print my_data[:,0]

i get the following:

[   nan    nan    nan  2007.    nan    nan    nan    nan    nan    nan ....

Which seems that it complains with the data type, then I try to specify the data types contained in the CSV:

my_data = np.genfromtxt('c:/tags.csv', dtype = [('mystring','S5'), ('myint','i8')], delimiter=',')

I get an array of tuples instead of a matrix....

[('flower', 1043L) ('sky', 585L) ('nikon', 552L) ('2007', 552L) ..... ]

What am I doing wrong???


Solution

  • If you are only interested in the first column, you can load the CSV as a 2D array of strings :

    my_data = np.genfromtxt('c:/tags.csv', delimiter=',', dtype='S')
    print my_data[:, 0]
    

    result :

    ['explore' 'sky' 'nikon' '2007']