I got this csv file from https://www.kaggle.com/currie32/crimes-in-chicago
I went to read the 2008-20011 csv to a dataframe using Pandas and I got a parseError message stating that in a certain row of the csv there are 41 fields found where it was expecting 23.
ParserError: Error tokenizing data. C error: Expected 23 fields in line 1149094, saw 41
I used this command to read the csv by simply skipping any bad rows:
CHIcrime_df2 = pd.read_csv(path, error_bad_lines=False)
That worked as planned, but I wanted to know what all those extra fields were so I read the file with csv.reader
with open('path') as data:
reader=csv.reader(data)
interestingrows=[row for idx, row in enumerate(reader) if idx==1149094]
I expected there to be 41 fields, but there were 23. I also wanted to be sure that I wasn't confusing indexes, so I printed a few before and after; each of them had the same number of fields. Can anyone help me understand what's going on with that?
David Makovoz has explained the issue already, so I'll just answer your very question:
How to view single row from csv with pandas
If the error occured at line n (1149094), you skip n-1 rows and read just 1 row:
df = pd.read_csv('Chicago_Crimes_2008_to_2011.csv', skiprows=1149093, nrows=1, header=None)
Result:
>>> print(df.values)
[[2023517 7818233 'HS626859' '11/21/2010 11:00:00 PM'
'079XX S JEFFERY BLVD' 460 'BATTERY' 'SIMPLE' 'STREET' False False 414
4.0 8.0 46.0 '08B' 1190912.0 1852820.0 2010 '02/04/2016 06:33:39 AM'
41.751151039 '-87.1:00:00 AM' '031XX W LEXINGTON ST' 810 'THEFT'
'OVER $500' 'STREET' False False 1134 11.0 24.0 27.0 6 nan nan 2008
'08/17/2015 03:03:40 PM' nan nan nan]]