Below is the content from myfile.csv
1st 2nd 3rd 4th 5th
2061100 10638650 -8000 25 [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
2061800 10639100 -8100 26 [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
2061150 10638750 -8250 25 [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]
2061650 10639150 -8200 25 [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]
2061350 10638800 -8250 3 [5.0, 5.0, 5.0]
2060950 10638700 -8000 1 [1.0]
2061700 10639100 -8100 11 [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
2061050 10638800 -8250 6 [3.0, 3.0, 3.0, 3.0, 3.0, 3.0]
2061500 10639150 -8200 1 [4.0]
2061250 10638850 -8150 16 [5.0, 5.0, 5.0, 5.0]
My code:
from numpy import genfromtxt
mydata = genfromtxt('myfile.csv', delimiter=',')
arr = np.array(mydata)
col5 = arr[:,4]
I want to read the 5th column from the csv file. However, the element in 5th column is list not value.
How can I revise my code?
Use pandas
to read your csv file and then slice
the column. Your []
is being taken as NaN
. So avoid NaN
. Example below (I have only few rows but it's the same for your whole data):
>>>import pandas as pd
>>>import numpy
>>>df = pd.read_csv("stack.csv",header=None,na_values=" NaN")
>>> df
0 1 2 3 4 5 6 7 8 9 10
0 206110 10638650 -8000 25 [1.0 1 1 1 1 1 1.0]
1 2061800 10639100 -8100 26 [2.0 2 2 2 2 2 2.0]
2 2061150 10638750 -8250 25 [3.0 3 3 3 3 3 3.0]
3 2061650 10639150 -8200 25 [4.0 4 4 4 4 4 4.0]
>>> x = df.ix[:,4:10]
>>> x
4 5 6 7 8 9 10
0 [1.0 1 1 1 1 1 1.0]
1 [2.0 2 2 2 2 2 2.0]
2 [3.0 3 3 3 3 3 3.0]
3 [4.0 4 4 4 4 4 4.0]
>>> x = numpy.array(x)
>>> x
array([['[1.0', 1.0, 1.0, 1.0, 1.0, 1.0, ' 1.0]'],
[' [2.0', 2.0, 2.0, 2.0, 2.0, 2.0, ' 2.0]'],
['[3.0', 3.0, 3.0, 3.0, 3.0, 3.0, ' 3.0]'],
[' [4.0', 4.0, 4.0, 4.0, 4.0, 4.0, ' 4.0]']], dtype=object)