Search code examples
pythonexcelpandasxlrddata-science

XLRD Error via Pandas


I'm getting the following error when I try to pd.read_excel(). This error is specific to my computer: when I run the script on a different computer with the same files, no error occurs. Anaconda distribution of Python 3.6.1. Pandas version '0.20.3', Xlrd version '1.1.0':

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x08jstanle' Jstanley is my computer name, so that may be a big hint that I'm missing.

It is trying to open either a .xls or .xlsx file. I've tried changing the extensions to old and new versions of Excel without any change.

Thanks for the help! I'll put bits of the full error below.

C:\Users\jstanley\Documents\----\---\Python\load_data_original.py in load_(exp_id, file_path)
     60 
     61 def load_(exp_id, file_path):
---> 62     dict_sheets = pd.read_excel(file_path, header=None, sheetname=None)
     63     new_dict_sheets = dict()
     64     

C:\Users\jstanley\Anaconda3\lib\site-packages\pandas\io\excel.py in read_excel(io, sheetname, header, skiprows, skip_footer, index_col, names, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, has_index_names, converters, dtype, true_values, false_values, engine, squeeze, **kwds)
    201 
    202     if not isinstance(io, ExcelFile):
--> 203         io = ExcelFile(io, engine=engine)
    204 
    205     return io._parse_excel(

C:\Users\jstanley\Anaconda3\lib\site-packages\pandas\io\excel.py in __init__(self, io, **kwds)
    258             self.book = xlrd.open_workbook(file_contents=data)
    259         elif isinstance(io, compat.string_types):
--> 260             self.book = xlrd.open_workbook(io)
    261         else:
    262             raise ValueError('Must explicitly set engine if not passing in'

C:\Users\jstanley\Anaconda3\lib\site-packages\xlrd\__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)

C:\Users\jstanley\Anaconda3\lib\site-packages\xlrd\book.py in open_workbook_xls(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
     89         t1 = time.clock()
     90         bk.load_time_stage_1 = t1 - t0
---> 91         biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
     92         if not biff_version:
     93             raise XLRDError("Can't determine file's BIFF version")

C:\Users\jstanley\Anaconda3\lib\site-packages\xlrd\book.py in getbof(self, rqd_stream)
   1228             elif rc == XL_NAME:
   1229                 self.handle_name(data)
-> 1230             elif rc == XL_PALETTE:
   1231                 self.handle_palette(data)
   1232             elif rc == XL_STYLE:

C:\Users\jstanley\Anaconda3\lib\site-packages\xlrd\book.py in bof_error(msg)
   1222             elif rc == XL_SHEETSOFFSET:
   1223                 self.handle_sheetsoffset(data)
-> 1224             elif rc == XL_SHEETHDR:
   1225                 self.handle_sheethdr(data)
   1226             elif rc == XL_SUPBOOK:

Solution

  • There seem to be issues with some excel files and XLRD, and it's often hard to tell which you're facing. Is the file something you downloaded? Or perhaps an old file? Corruption sneaks into Excel files in seemingly random ways.

    This question might help. Also, look through this page for other ideas.

    The best solution seems to be opening the file in Excel, then saving it as another format (sometimes even just as a new .xlsx file). Manual, inelegant, and annoying. But I've had to do it several times and it's worked.