Problem:
I am opening a .xls with pd.read_excel, but I got an error.
df_cima = pd.read_excel("docs/Presentaciones.xls")
xlrd.biffh.XLRDError: Excel xlsx file; not supported
The suffix of this file is .xls but this error tells me that it is .xlsx
Then I tried to add engine="openpyxl"
, which is usually used for reading the .xlsx when xlrd version is no longer 1.2.0, then it gives me another error
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format.
MY env:
I do not want to change my xlrd version back to 1.2.0, from other answer I see that new version of xlrd support only .xls, but I don't understand why it is not working for my file.
Thanks in advance.
With Pandas 1.1.5 and xlrd 2.1.0
Rename Presentaciones.xls
to Presentaciones.xlsx
.
import pandas as pd
# Use openpyxl.
df = pd.read_excel(r'X:...\Presentaciones.xlsx', engine='openpyxl')
print(df)
Enjoy! :)
How do I know that your file is a fake .xls
and a very real .xlsx
?
Because openpyxl
doesn't work with xls
files.
import pandas as pd
df = pd.read_excel(r'X:...\test.xls', engine='openpyxl')
/*
ERROR:
InvalidFileException: openpyxl does not support the old .xls file format,
please use xlrd to read this file, or convert it to the more recent .xlsx file format.
*/
And trying to simply rename test.xls
to test.xlsx
does not work either!
import pandas as pd
df = pd.read_excel(r'X:...\test.xlsx', engine='openpyxl')
/*
Error:
OSError: File contains no valid workbook part
*/
Beware, the .xlsx
extension (detected by pandas) means there may be scripts in this file. Sometimes the extension can lie, so be careful!
The reason why panda stopped supporting xlsx
files is that those files are a security hazard and no one was maintaining this part of the code.