Search code examples
pythonmacosimport-from-excel

repairing Excel file


I have about 1500 excel spreadsheet files downloaded from movescount.com - these contain heart rate data taken over a few years. I want to process these files with a bit of Python + openpyxl. The problem is that opening the excel file with openpyxl fails. Specifically it complains that a styles.xml definition is missing. Also Apple’s Numbers cannot open the file. When I open the file with MS Excel (which works) and simply save the file right away, I can open and process it with python and openpyxl just fine.

Clearly, MS Excel is repairing some format issue. But I would like to avoid having to open and save the file with MS Excel - particularly because that means I need to do this on an MS Windows box. What is the best way to repair these files on the fly on my OS X machine?


Solution

  • An Excel xlsx file is a zipped archive of files. So you can write the missing file to the xlsx archive provided you have one good file to start with.

    import zipfile
    
    with zipfile.ZipFile('/path/to/goodfile.xlsx', 'r') as zgood:
        styles_xml = zgood.read('xl/styles.xml')
    
    with zipfile.ZipFile('/path/to/badfile.xlsx', 'a') as zbad:
        zbad.writestr('xl/styles.xml', styles_xml)