I am trying to read a CSV file (containing one column of strings and one of integers) into a matrix using genfromtxt and then use slicing to get only the column containing the string values and load it into an array for further processing.
CSV File:
explore,1043
sky, 585
nikon, 552
2007, 552
....
I use genfromtxt to load the csv:
my_data = np.genfromtxt('c:/tags.csv', delimiter=',')
and when I try to slice the matrix in order to get column containing the strings only:
print my_data[:,0]
i get the following:
[ nan nan nan 2007. nan nan nan nan nan nan ....
Which seems that it complains with the data type, then I try to specify the data types contained in the CSV:
my_data = np.genfromtxt('c:/tags.csv', dtype = [('mystring','S5'), ('myint','i8')], delimiter=',')
I get an array of tuples instead of a matrix....
[('flower', 1043L) ('sky', 585L) ('nikon', 552L) ('2007', 552L) ..... ]
What am I doing wrong???
If you are only interested in the first column, you can load the CSV as a 2D array of strings :
my_data = np.genfromtxt('c:/tags.csv', delimiter=',', dtype='S')
print my_data[:, 0]
result :
['explore' 'sky' 'nikon' '2007']