Search code examples
python-3.xexcelpandasxlsxwritercorrupt

xlsxwriter formatting is corrupting my excel file


Question, i'm trying to apply some formatting on an xlsx file using xlsxwriter but Excel says my exported file is corrupt.

I believe the issue stems from this line : format1 = workbook.add_format({'font_color': 'red'}) but I do not understand why.

I have updated MS Excel to the latest version and have tried opening the file on another computer but I keep getting this error: We found a problem with some content in 'export_top200format.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.

Followed by this error:

Excel was able to open the file by repairing or removing the unreadable content.

import pandas as pd
import os
import xlsxwriter

# IMPORTING dfCAtop200_CLEANED
ca_top200_CLEANED = 'export_top200CLEANEDTEST.xlsx'

# READING 'export_top200CLEANEDTEST.xlsx' FILE
dfCAtop200_Formatted = pd.read_excel(
    ca_top200_CLEANED)

column_list = dfCAtop200_Formatted.columns

writer = pd.ExcelWriter(
    '/Users/BTS/Documents/python_work/ABC Charts/export_top200FORMATTED.xlsx', engine='xlsxwriter')

dfCAtop200_Formatted.to_excel(writer, index=False)

# to remove source header formatting

for idx, val in enumerate(column_list):
    worksheet.write(0, idx, val)

# workbook stuff

workbook = xlsxwriter.Workbook('export_top200FORMATTED.xlsx')
worksheet = writer.sheets['Sheet1']

format1 = workbook.add_format({'font_color': 'red'})

worksheet.set_column('B:B', 11, format1)


writer.save()

Solution

  • The issue is that the program is overwriting the xlsx file created by Pandas with a new one created by XlsxWriter while trying to use the worksheet created by Pandas to add formatting. The issue is here:

    workbook = xlsxwriter.Workbook('export_top200FORMATTED.xlsx')
    worksheet = writer.sheets['Sheet1']
    

    The correct way to access a Pandas created workbook or worksheet is shown in the XlsxWriter documentation on Working with Python Pandas and XlsxWriter.

    Here is a working example based on your code. It also fixes a issue in the code above where the first row of the dataframe data is overwritten:

    import pandas as pd
    
    # Create a Pandas dataframe from some data.
    data = [10, 20, 30, 40, 50, 60]
    df = pd.DataFrame({'Foo': data,
                       'Bar' : data,
                       'Baz' : data})
    
    # Create a Pandas Excel writer using XlsxWriter as the engine.
    writer = pd.ExcelWriter("formatting.xlsx", engine='xlsxwriter')
    
    # Convert the dataframe to an XlsxWriter Excel object. Note that we turn off
    # the default header and skip one row to allow us to insert a user defined
    # header.
    df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False)
    
    # Get the xlsxwriter workbook and worksheet objects.
    workbook  = writer.book
    worksheet = writer.sheets['Sheet1']
    
    # Rewrite the column headers without formatting.
    for col_num, value in enumerate(df.columns.values):
        worksheet.write(0, col_num + 1, value)
    
    # Add a format to column B.
    format1 = workbook.add_format({'font_color': 'red'})
    
    worksheet.set_column('B:B', 11, format1)
    
    # Close the Pandas Excel writer and output the Excel file.
    writer.save()
    
    

    Output:

    enter image description here