Search code examples
pythoncsvpandasdataframedata-analysis

CParserError: Error tokenizing data


I'm having some trouble reading a csv file

import pandas as pd

df = pd.read_csv('Data_Matches_tekha.csv', skiprows=2)

I get

pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 526, saw 5

and when I add sep=None to df I get another error

Error: line contains NULL byte

I tried adding unicode='utf-8', I even tried CSV reader and nothing works with this file

the csv file is totally fine, I checked it and i see nothing wrong with it

Here are the errors I get:


Solution

  • In your actual code, the line is:

    >>> pandas.read_csv("Data_Matches_tekha.xlsx", sep=None)
    

    You are trying to read an Excel file, and not a plain text CSV which is why things are not working.

    Excel files (xlsx) are in a special binary format which cannot be read as simple text files (like CSV files).

    You need to either convert the Excel file to a CSV file (note - if you have multiple sheets, each sheet should be converted to its own csv file), and then read those.

    You can use read_excel or you can use a library like xlrd which is designed to read the binary format of Excel files; see Reading/parsing Excel (xls) files with Python for for more information on that.