I'm trying to read an xlsx file into python using pandas.
I've done this thousands of times before but some reason it is not working with a particular file.
The file is downloaded from another source and I get an AssertionError (see end) when reading with pandas:
df = pandas.read_excel(pathtomyfile, sheetname = "Sheet1")
The variable is defined for the path. The path exists (os.path.exists(path) returns True).
When I copy the contents of the file and paste the values in a new excel doc, this new one will open with the read_excel() method.
When I copy the contents of the file and paste the formatting in a new excel, this new one will open with the read_excel() method.
It doesn't seem to be the values or the formatting.
I am guessing this could be an encoding issue?
Thank you for any help.
df1 = pandas.read_excel(snap1)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 163, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 206, in __init__
self.book = xlrd.open_workbook(io)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 794, in open_workbook_2007_xml
x12sheet.process_stream(zflo, heading)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 531, in own_process_stream
self_do_row(elem)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 597, in do_row
assert 0 <= self.rowx < X12_MAX_ROWS
AssertionError
The file contained Korean characters in the text. These needed alternative encoding. Using the "encoding" parameter in the read_excel() method resolved the issue.
df = pandas.read_excel(pathtomyfile, sheetname = "Sheet1", encoding="utf-16")