Search code examples
pythonpython-3.xexcelopenpyxlpython-zipfile

Using openpyxl module to write to spreadsheet creates a damaged spreadsheet, how to fix with zipfile module?


I have a program which writes to a spreadsheet using openpyxl. Upon execution of the program, the cells are filled as expected but the spreadsheet becomes damaged. Excel repairs the spreadsheet and I can then view it again.

import openpyxl
from openpyxl import load_workbook
amounts, row = [1, 2, 3, 4, 5], 2
book = load_workbook("output.xlsx")
sheet = book.active

for i, value in enumerate(amounts):
    sheet.cell(column=i+1, row=row, value=value)  
print ("Sheet updating complete.")
book.save("output.xlsx")

I have tried using the Open XML SDK Productivity Tool from Microsoft to compare a good and bad file with each other and noticed that styles.xml is missing. I try to copy this over using the following source code I have obtained from another question, but it does not solve the issue for me.

import zipfile
with zipfile.ZipFile('outputcopy.xlsx', 'r') as zgood:
    styles_xml = zgood.read('xl/styles.xml')
with zipfile.ZipFile('output.xlsx', 'a') as zbad:
    zbad.writestr('xl/styles.xml', styles_xml)

I can confirm from the repair log Excel generates, that the problem is with xl/styles.xml. I need to copy this xml file from the good copy, to the bad copy.

How can I get the xl/styles.xml file copied so that the program can run without damaging output.xlsx?

I have made another attempt to fix this issue. In the off chance that styles.xml cannot be copied from a different Excel file; I have opened styles.xml from output.xlsx prior to book.save("output.xlsx"). After saving, I then get the styles.xml from before the save statement, and write it. Unfortunately, this has not changed anything and I am still getting a damaged Excel file. With this attempt, my test code looks like this:

import openpyxl
import zipfile

from openpyxl import load_workbook
amounts, indexValue, row = [1, 2, 3, 4, 5], 0, 2
book = load_workbook("output.xlsx")
sheet = book.active

for i, value in enumerate(amounts):
    sheet.cell(column=i+1, row=row, value=value)  
print ("Sheet updating complete.")

with zipfile.ZipFile('output.xlsx', 'r') as zgood:
    styles_xml = zgood.read('xl/styles.xml')

book.save("output.xlsx")

with zipfile.ZipFile('output.xlsx', 'a') as zbad:
    zbad.writestr('xl/styles.xml', styles_xml)

I have tried saving as a completely new Excel File, but still have the same issue. I tried using zip file to open from output.xlsx and writing to the newly saved file, but still no result.

import openpyxl
import zipfile
from openpyxl import load_workbook

amounts, indexValue, row, cell = [1, 2, 3, 4, 5], 0, 2, "A2"
book = load_workbook("output.xlsx")
sheet = book.active

while indexValue != 5:
    sheet[cell] = amounts[indexValue]
    indexValue += 1
    cell = chr(ord(cell[0]) + 1) + str(cell[1])
print ("Sheet updating complete.")

book.save("test.xlsx")

with zipfile.ZipFile('output.xlsx', 'r') as zgood:
    styles_xml = zgood.read('xl/styles.xml')
with zipfile.ZipFile('test.xlsx', 'a') as zbad:
    zbad.writestr('xl/styles.xml', styles_xml)

Although I have already fixed this issue, it is worth noting that this problem only seems to occur when loading a workbook. I have created another program with spreadsheets that creates a workbook, rather than loading it. As a result of this, the spreadsheet does not saves damaged.


Solution

  • After confirming that the issue was with styles.xml, I identified that the issue was most likely with the style formatting of the written cells. By using styles from the openpyxl module, I have fixed the issue.

    I declare a variable, fontStyle in this case, and set all the style settings:

    fontStyle = Font(name="Calibri", size=12, color=colors.BLACK)
    

    When writing amounts to each cell, I also set the style of these cells using fontStyle:

    sheet[cell].font = fontStyle
    

    The completed code, looks like this:

    import openpyxl
    from openpyxl import load_workbook
    from openpyxl.styles import colors
    from openpyxl.styles import Font, Color
    
    fontStyle = Font(name="Calibri", size=12, color=colors.BLACK)
    amounts, indexValue, cell = [1, 2, 3, 4, 5], 0, "A2"
    book = load_workbook("output.xlsx")
    sheet = book.active
    
    while indexValue != 5:
        sheet[cell] = amounts[indexValue]
        sheet[cell].font = fontStyle
        indexValue += 1
        cell = chr(ord(cell[0]) + 1) + str(cell[1])
    
    print ("Sheet updating complete.")
    book.save("output.xlsx")
    

    I believe this has worked because the writing method has no default style settings. This would explain why styles.xml was missing when using the Open XML SDK Productivity Tool. Upon checking this Excel file again after the fix, I can confirm that styles.xml is no longer missing.

    The file is no longer damaged upon being saved and can be opened normally again. Also, I am now able to execute this script to write to the Excel file again, without having to open and close to repair it.

    Note that I have also changed my loop from the original loop - as part of one of my attempts to fix the issue. This has not had an effect on the final outcome - it is all down to the styling of the cells written.

    This doesn't exactly answer the question of solving the issue specifically with zipfile but it does solve the problem.