Search code examples
pythonarrayscsvnumpygenfromtxt

How do you allow for text qualifiers using numpy genfromtxt


I am currently trying to import some comma delimited text data into an array using the numpy library in Python. I am using the following code:

data = np.genfromtxt(fname, delimiter=',')

I get the following error:

Line #2 (got 12 columns instead of 11)

for every line after the header.

The reason for this appears to be that one of the columns contains a comma, but attempts to deal with this using text qualifiers (") around the data for that column. If I used the Python csv library this is handled by default e.g.:

reader = csvreader(open(fname, 'rb'))

I know that I could import the data using the csv library and then convert it to an array, but I wondered if it is possible to do this from one of numpy's functions that convert text data to an array such as genfromtxt. I have checked out the help on genfromtxt but none of the arguments listed appear to describe what I was looking for, unless I am missing something.

In case it helps here is a sample of a few lines from the file:

survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S

It is the name column that I assume is causing the issue.


Solution

  • Numpy arrays are not well-suited for categorical data like you have here. You may be better off using pandas:

    import pandas
    data = pandas.read_csv(fname)