Search code examples
pythonpandascsvparse-error

How to view single row from csv with pandas


I got this csv file from https://www.kaggle.com/currie32/crimes-in-chicago

I went to read the 2008-20011 csv to a dataframe using Pandas and I got a parseError message stating that in a certain row of the csv there are 41 fields found where it was expecting 23.

ParserError: Error tokenizing data. C error: Expected 23 fields in line 1149094, saw 41

I used this command to read the csv by simply skipping any bad rows:

CHIcrime_df2 = pd.read_csv(path, error_bad_lines=False)

That worked as planned, but I wanted to know what all those extra fields were so I read the file with csv.reader

with open('path') as data: reader=csv.reader(data) interestingrows=[row for idx, row in enumerate(reader) if idx==1149094]

I expected there to be 41 fields, but there were 23. I also wanted to be sure that I wasn't confusing indexes, so I printed a few before and after; each of them had the same number of fields. Can anyone help me understand what's going on with that?


Solution

  • David Makovoz has explained the issue already, so I'll just answer your very question:

    How to view single row from csv with pandas

    If the error occured at line n (1149094), you skip n-1 rows and read just 1 row:

    df = pd.read_csv('Chicago_Crimes_2008_to_2011.csv', skiprows=1149093, nrows=1, header=None)
    

    Result:

    >>> print(df.values)
    [[2023517 7818233 'HS626859' '11/21/2010 11:00:00 PM'
      '079XX S JEFFERY BLVD' 460 'BATTERY' 'SIMPLE' 'STREET' False False 414
      4.0 8.0 46.0 '08B' 1190912.0 1852820.0 2010 '02/04/2016 06:33:39 AM'
      41.751151039 '-87.1:00:00 AM' '031XX W LEXINGTON ST' 810 'THEFT'
      'OVER $500' 'STREET' False False 1134 11.0 24.0 27.0 6 nan nan 2008
      '08/17/2015 03:03:40 PM' nan nan nan]]