Python TypeError: list indices must be integers, not tuple

Using the python 2.7 shell on osx lion. The .csv file has 12 columns by 892 rows.

import csv as csv
import numpy as np
# Open up csv file into a Python object
csv_file_object = csv.reader(open('/Users/scdavis6/Documents/Kaggle/train.csv', 'rb'))
header = csv_file_object.next()
data=[]
for row in csv_file_object:
    data.append(row)
    data = np.array(data)

# Convert to float for numerical calculations
number_passengers = np.size(data[0::,0].astype(np.float))

And this is the error I get:

Traceback (most recent call last):
  File "pyshell#5>", line 1, in <module>
    number_passengers = np.size(data[0::,0].astype(np.float))
TypeError: list indices must be integers, not tuple

What am I doing wrong.

Solution

Don't use csv to read the data into a NumPy array. Use numpy.genfromtxt; using dtype=None will cause genfromtxt to make an intelligent guess at the dtypes for you. By doing it this way you won't have to manually convert strings to floats.

data[0::, 0] just gives you the first column of data. data[:, 0] would give you the same result.

The error message

TypeError: list indices must be integers, not tuple

suggests that for some reason your data variable might be holding a list rather than a ndarray. For example, the same Exception can produced like this:

In [73]: data = [1,2,3]

In [74]: data[1,2]
TypeError: list indices must be integers, not tuple

I don't know why that is happening, but if you post a sample of your CSV we should be able to help fix that.

Using np.genfromtxt, your current code could be simplified to:

import numpy as np
filename = '/Users/scdavis6/Documents/Kaggle/train.csv'
data = np.genfromtxt(filename, delimiter=',', skiprows=1, dtype=None)
number_passengers = np.size(data, axis=0)