Search code examples
pythonpandasxlsm

Pandas open_excel() fails with xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document


I'm trying to use pandas to parse an .xlsm document. My code worked perfectly with the example file I was given, but once I got the rest of the documents, it failed with the above error. Here's the offending stack trace:

Traceback (most recent call last):
  File "@@@@@@@@/UnsupervisedCAM.py", line 9, in <module>
    info_dict = read_excel_to_dict('files/' + filename)
  File "@@@@@@@@\readCAM.py", line 7, in read_excel_to_dict
    df = pandas.read_excel(filename, parse_cols='E,G,I,K,Q,O')
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\pandas\io\excel.py", line 191, in read_excel
    io = ExcelFile(io, engine=engine)
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\pandas\io\excel.py", line 249, in __init__
    self.book = xlrd.open_workbook(io)
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\xlrd\__init__.py", line 441, in open_workbook
    ragged_rows=ragged_rows,
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\xlrd\book.py", line 87, in open_workbook_xls
    ragged_rows=ragged_rows,
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\xlrd\book.py", line 595, in biff2_8_load
    raise XLRDError("Can't find workbook in OLE2 compound document")
xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document

I'm not even sure where to start... Haven't found anything of use online.


Solution

  • After a lot of searching, the only way I've found to do this is to open and save all the excel documents, which seems to 'strip' them of their OLE2 format. I automated the process with the following vbs script:

    Dim objFSO, objFolder, objFile
    Dim objExcel, objWB
    Set objExcel = CreateObject("Excel.Application")
    Set objFSO = CreateObject("scripting.filesystemobject")
       MyFolder = "<PATH/TO/FILES"
    Set objFolder = objfso.getfolder(myfolder)
    For Each objFile In objfolder.Files
    If Right(objFile.Name,4) = "<EXTENSION>" Then
    Set objWB = objExcel.Workbooks.Open(objFile)
    objWB.save
    objWB.close
    End If
    Next
    objExcel.Quit
    Set objExcel = Nothing
    Set objFSO = Nothing
    Wscript.Echo "Done"
    

    Make sure to change the path to the folder and extension.