Search code examples
python-2.7csvnumpyosx-mountain-lion

Python TypeError: list indices must be integers, not tuple


Using the python 2.7 shell on osx lion. The .csv file has 12 columns by 892 rows.

import csv as csv
import numpy as np
# Open up csv file into a Python object
csv_file_object = csv.reader(open('/Users/scdavis6/Documents/Kaggle/train.csv', 'rb'))
header = csv_file_object.next()
data=[]
for row in csv_file_object:
    data.append(row)
    data = np.array(data)

# Convert to float for numerical calculations
number_passengers = np.size(data[0::,0].astype(np.float))

And this is the error I get:

Traceback (most recent call last):
  File "pyshell#5>", line 1, in <module>
    number_passengers = np.size(data[0::,0].astype(np.float))
TypeError: list indices must be integers, not tuple 

What am I doing wrong.


Solution

  • Don't use csv to read the data into a NumPy array. Use numpy.genfromtxt; using dtype=None will cause genfromtxt to make an intelligent guess at the dtypes for you. By doing it this way you won't have to manually convert strings to floats.

    data[0::, 0] just gives you the first column of data. data[:, 0] would give you the same result.

    The error message

    TypeError: list indices must be integers, not tuple 
    

    suggests that for some reason your data variable might be holding a list rather than a ndarray. For example, the same Exception can produced like this:

    In [73]: data = [1,2,3]
    
    In [74]: data[1,2]
    TypeError: list indices must be integers, not tuple
    

    I don't know why that is happening, but if you post a sample of your CSV we should be able to help fix that.

    Using np.genfromtxt, your current code could be simplified to:

    import numpy as np
    filename = '/Users/scdavis6/Documents/Kaggle/train.csv'
    data = np.genfromtxt(filename, delimiter=',', skiprows=1, dtype=None)
    number_passengers = np.size(data, axis=0)