I'm getting the following error when I try to pd.read_excel(). This error is specific to my computer: when I run the script on a different computer with the same files, no error occurs. Anaconda distribution of Python 3.6.1. Pandas version '0.20.3'
, Xlrd version '1.1.0'
:
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x08jstanle'
Jstanley is my computer name, so that may be a big hint that I'm missing.
It is trying to open either a .xls or .xlsx file. I've tried changing the extensions to old and new versions of Excel without any change.
Thanks for the help! I'll put bits of the full error below.
C:\Users\jstanley\Documents\----\---\Python\load_data_original.py in load_(exp_id, file_path)
60
61 def load_(exp_id, file_path):
---> 62 dict_sheets = pd.read_excel(file_path, header=None, sheetname=None)
63 new_dict_sheets = dict()
64
C:\Users\jstanley\Anaconda3\lib\site-packages\pandas\io\excel.py in read_excel(io, sheetname, header, skiprows, skip_footer, index_col, names, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, has_index_names, converters, dtype, true_values, false_values, engine, squeeze, **kwds)
201
202 if not isinstance(io, ExcelFile):
--> 203 io = ExcelFile(io, engine=engine)
204
205 return io._parse_excel(
C:\Users\jstanley\Anaconda3\lib\site-packages\pandas\io\excel.py in __init__(self, io, **kwds)
258 self.book = xlrd.open_workbook(file_contents=data)
259 elif isinstance(io, compat.string_types):
--> 260 self.book = xlrd.open_workbook(io)
261 else:
262 raise ValueError('Must explicitly set engine if not passing in'
C:\Users\jstanley\Anaconda3\lib\site-packages\xlrd\__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
C:\Users\jstanley\Anaconda3\lib\site-packages\xlrd\book.py in open_workbook_xls(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
89 t1 = time.clock()
90 bk.load_time_stage_1 = t1 - t0
---> 91 biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
92 if not biff_version:
93 raise XLRDError("Can't determine file's BIFF version")
C:\Users\jstanley\Anaconda3\lib\site-packages\xlrd\book.py in getbof(self, rqd_stream)
1228 elif rc == XL_NAME:
1229 self.handle_name(data)
-> 1230 elif rc == XL_PALETTE:
1231 self.handle_palette(data)
1232 elif rc == XL_STYLE:
C:\Users\jstanley\Anaconda3\lib\site-packages\xlrd\book.py in bof_error(msg)
1222 elif rc == XL_SHEETSOFFSET:
1223 self.handle_sheetsoffset(data)
-> 1224 elif rc == XL_SHEETHDR:
1225 self.handle_sheethdr(data)
1226 elif rc == XL_SUPBOOK:
There seem to be issues with some excel files and XLRD, and it's often hard to tell which you're facing. Is the file something you downloaded? Or perhaps an old file? Corruption sneaks into Excel files in seemingly random ways.
This question might help. Also, look through this page for other ideas.
The best solution seems to be opening the file in Excel, then saving it as another format (sometimes even just as a new .xlsx file). Manual, inelegant, and annoying. But I've had to do it several times and it's worked.