Search code examples
pythonexcelpandasxlsxxlrd

AssertionError with pandas when reading excel


I'm trying to read an xlsx file into python using pandas.
I've done this thousands of times before but some reason it is not working with a particular file.

The file is downloaded from another source and I get an AssertionError (see end) when reading with pandas:

df = pandas.read_excel(pathtomyfile, sheetname = "Sheet1")

The variable is defined for the path. The path exists (os.path.exists(path) returns True).

When I copy the contents of the file and paste the values in a new excel doc, this new one will open with the read_excel() method.

When I copy the contents of the file and paste the formatting in a new excel, this new one will open with the read_excel() method.

It doesn't seem to be the values or the formatting.

I am guessing this could be an encoding issue?
Thank you for any help.

    df1 = pandas.read_excel(snap1)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 163, in read_excel
    io = ExcelFile(io, engine=engine)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 206, in __init__
    self.book = xlrd.open_workbook(io)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
    ragged_rows=ragged_rows,
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 794, in open_workbook_2007_xml
    x12sheet.process_stream(zflo, heading)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 531, in own_process_stream
    self_do_row(elem)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 597, in do_row
    assert 0 <= self.rowx < X12_MAX_ROWS
AssertionError

Solution

  • The file contained Korean characters in the text. These needed alternative encoding. Using the "encoding" parameter in the read_excel() method resolved the issue.

    df = pandas.read_excel(pathtomyfile, sheetname = "Sheet1", encoding="utf-16")