Search code examples
pythoncsvwriter

Read any column from csv file


Below is the content from myfile.csv

  1st        2nd     3rd      4th                     5th
2061100   10638650  -8000     25         [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
2061800   10639100  -8100     26         [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
2061150   10638750  -8250     25         [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]
2061650   10639150  -8200     25         [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]
2061350   10638800  -8250     3          [5.0, 5.0, 5.0]
2060950   10638700  -8000     1          [1.0]
2061700   10639100  -8100     11         [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
2061050   10638800  -8250     6          [3.0, 3.0, 3.0, 3.0, 3.0, 3.0]
2061500   10639150  -8200     1          [4.0]
2061250   10638850  -8150     16         [5.0, 5.0, 5.0, 5.0]

My code:

from numpy import genfromtxt
mydata = genfromtxt('myfile.csv', delimiter=',')
arr = np.array(mydata)
col5 = arr[:,4]

I want to read the 5th column from the csv file. However, the element in 5th column is list not value.

How can I revise my code?


Solution

  • Use pandas to read your csv file and then slice the column. Your [] is being taken as NaN. So avoid NaN. Example below (I have only few rows but it's the same for your whole data):

     >>>import pandas as pd
     >>>import numpy
     >>>df = pd.read_csv("stack.csv",header=None,na_values=" NaN")
     >>> df
                0         1     2   3      4   5   6   7   8   9      10
                0   206110  10638650 -8000  25   [1.0   1   1   1   1   1   1.0]
                1  2061800  10639100 -8100  26   [2.0   2   2   2   2   2   2.0]
                2  2061150  10638750 -8250  25   [3.0   3   3   3   3   3   3.0]
                3  2061650  10639150 -8200  25   [4.0   4   4   4   4   4   4.0]
     >>> x = df.ix[:,4:10]
     >>> x
                 4   5   6   7   8   9      10
             0   [1.0   1   1   1   1   1   1.0]
             1   [2.0   2   2   2   2   2   2.0]
             2   [3.0   3   3   3   3   3   3.0]
             3   [4.0   4   4   4   4   4   4.0]
      >>> x = numpy.array(x)
      >>> x
          array([['[1.0', 1.0, 1.0, 1.0, 1.0, 1.0, ' 1.0]'],
                 [' [2.0', 2.0, 2.0, 2.0, 2.0, 2.0, ' 2.0]'],
                  ['[3.0', 3.0, 3.0, 3.0, 3.0, 3.0, ' 3.0]'],
                  [' [4.0', 4.0, 4.0, 4.0, 4.0, 4.0, ' 4.0]']], dtype=object)