python-3.x excel pandas xlsxwriter corrupt

xlsxwriter formatting is corrupting my excel file

Question, i'm trying to apply some formatting on an xlsx file using xlsxwriter but Excel says my exported file is corrupt.

I believe the issue stems from this line : format1 = workbook.add_format({'font_color': 'red'}) but I do not understand why.

I have updated MS Excel to the latest version and have tried opening the file on another computer but I keep getting this error: We found a problem with some content in 'export_top200format.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.

Followed by this error:

Excel was able to open the file by repairing or removing the unreadable content.

import pandas as pd
import os
import xlsxwriter

# IMPORTING dfCAtop200_CLEANED
ca_top200_CLEANED = 'export_top200CLEANEDTEST.xlsx'

# READING 'export_top200CLEANEDTEST.xlsx' FILE
dfCAtop200_Formatted = pd.read_excel(
    ca_top200_CLEANED)

column_list = dfCAtop200_Formatted.columns

writer = pd.ExcelWriter(
    '/Users/BTS/Documents/python_work/ABC Charts/export_top200FORMATTED.xlsx', engine='xlsxwriter')

dfCAtop200_Formatted.to_excel(writer, index=False)

# to remove source header formatting

for idx, val in enumerate(column_list):
    worksheet.write(0, idx, val)

# workbook stuff

workbook = xlsxwriter.Workbook('export_top200FORMATTED.xlsx')
worksheet = writer.sheets['Sheet1']

format1 = workbook.add_format({'font_color': 'red'})

worksheet.set_column('B:B', 11, format1)


writer.save()

Solution

The issue is that the program is overwriting the xlsx file created by Pandas with a new one created by XlsxWriter while trying to use the worksheet created by Pandas to add formatting. The issue is here:

workbook = xlsxwriter.Workbook('export_top200FORMATTED.xlsx')
worksheet = writer.sheets['Sheet1']

The correct way to access a Pandas created workbook or worksheet is shown in the XlsxWriter documentation on Working with Python Pandas and XlsxWriter.

Here is a working example based on your code. It also fixes a issue in the code above where the first row of the dataframe data is overwritten:

import pandas as pd

# Create a Pandas dataframe from some data.
data = [10, 20, 30, 40, 50, 60]
df = pd.DataFrame({'Foo': data,
                   'Bar' : data,
                   'Baz' : data})

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter("formatting.xlsx", engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object. Note that we turn off
# the default header and skip one row to allow us to insert a user defined
# header.
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False)

# Get the xlsxwriter workbook and worksheet objects.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

# Rewrite the column headers without formatting.
for col_num, value in enumerate(df.columns.values):
    worksheet.write(0, col_num + 1, value)

# Add a format to column B.
format1 = workbook.add_format({'font_color': 'red'})

worksheet.set_column('B:B', 11, format1)

# Close the Pandas Excel writer and output the Excel file.
writer.save()

Output: