Search code examples
pythonexcelpandasdataframexlsxwriter

Unable to get text wrapping or vertical centering to work with xlsxwriter


I'm working with a set of dataframes in Pandas. When printing these dataframes to a .xlsx file, I'm printing each dataframe to a different tab. I'm using xlsxwriter to format the worksheets before saving. I'm running into trouble when I try to do anything other than align the cells to 'center'. The script has no issue with setting column width or row height. I cannot get text wrapping or vertical centering to work.

I have tried using the following code, but it never wraps the text in the saved or vertically centers in the .xlsx file. It does nothing.

cell_format2.set_text_wrap()
cell_format2.set_align('center')
cell_format2.set_align('vcenter')

I have also tried to do the formatting in the add_format function as follows:

cell_format2 = workbook.add_format({'text_wrap': True, 'valign': True})

Using the add_format function always gives me the following error:

AttributeError: 'bool' object has no attribute 'lower'

Here's my code where I'm printing 3 dataframes to 3 separate tabs with xlsxwriter

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter(r'Ashland_Detailed_MTO.xlsx', engine='xlsxwriter')
# Write each dataframe to a different worksheet.
df1.to_excel(writer, index=None, header=True, sheet_name='List A')
df3.to_excel(writer, index=None, header=True, sheet_name='List B')
df5.to_excel(writer, index=None, header=True, sheet_name='List C')

workbook = writer.book
cell_format = workbook.add_format()
cell_format.set_align('center')
cell_format2 = workbook.add_format()
cell_format2.set_text_wrap()
cell_format2.set_align('center')

worksheet = writer.sheets['List A']
worksheet.freeze_panes(1, 0)
worksheet.set_column('A:A', 35, cell_format)
worksheet.set_column('B:B', 12, cell_format)
worksheet.set_column('C:C', 6, cell_format)
worksheet.set_column('D:D', 118, cell_format)
worksheet.set_column('E:H', 7, cell_format)
worksheet.set_column('I:J', 15, cell_format)
worksheet.set_column('K:N', 10, cell_format)

worksheet = writer.sheets['List B']
worksheet.freeze_panes(1, 0)
worksheet.set_column('A:G', 15, cell_format)

worksheet = writer.sheets['List C']
worksheet.set_row(0, 45, cell_format2)
worksheet.freeze_panes(1, 0)
worksheet.set_column('A:G', 12, cell_format)
worksheet.set_column('H:H', 17, cell_format)
worksheet.set_column('I:I', 26, cell_format)
worksheet.set_column('J:J', 20, cell_format)
worksheet.set_column('K:K', 57, cell_format)
worksheet.set_column('L:L', 20, cell_format)
worksheet.set_column('M:M', 57, cell_format)
worksheet.set_column('N:N', 16, cell_format)
worksheet.set_column('O:O', 20, cell_format)
worksheet.set_column('P:AL', 13, cell_format)
worksheet.set_column('AM:AM', 40, cell_format)
writer.save()

Each time I open the saved .xlsx file, the formatting is as shown below (1st image). I am wanting it to format as shown in the 2nd image.

image


Solution

  • On the documentaion site of xlsxwriter (https://xlsxwriter.readthedocs.io/working_with_pandas.html), you can find following information:

    Pandas writes the dataframe header with a default cell format. Since it is a cell format it cannot be overridden using set_row(). If you wish to use your own format for the headings then the best approach is to turn off the automatic header from Pandas and write your own.

    They also provide an example, in which the headers are written seperately after the dataframe is written to the xlsx file without headers:

    # Turn off the default header and skip one row to allow us to insert a
    # user defined header.
    df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False)
    
    # Get the xlsxwriter workbook and worksheet objects.
    workbook  = writer.book
    worksheet = writer.sheets['Sheet1']
    
    # Add a header format.
    header_format = workbook.add_format({
        'bold': True,
        'text_wrap': True,
        'valign': 'top',
        'fg_color': '#D7E4BC',
        'border': 1})
    
    # Write the column headers with the defined format.
    for col_num, value in enumerate(df.columns.values):
        worksheet.write(0, col_num + 1, value, header_format)
    

    In your case, the code would change to:

    # Create a Pandas Excel writer using XlsxWriter as the engine.
    writer = pd.ExcelWriter(r'Ashland_Detailed_MTO.xlsx', engine='xlsxwriter')
    # Write each dataframe to a different worksheet.
    df1.to_excel(writer, index=None, startrow=1, header=False, sheet_name='List A')
    df3.to_excel(writer, index=None, startrow=1, header=False, sheet_name='List B')
    df5.to_excel(writer, index=None, startrow=1, header=False, sheet_name='List C')
    
    
    # Get the xlsxwriter workbook and worksheet objects.
    workbook  = writer.book
    
    # Add a header format.
    header_format = workbook.add_format({
        'bold': True,
        'text_wrap': True,
        'valign': 'top',
        'border': 1})
    
    # Write the column headers with the defined format.
    worksheet = writer.sheets['List A']
    for col_num, value in enumerate(df1.columns.values):
        worksheet.write(0, col_num + 1, value, header_format)
    
    worksheet = writer.sheets['List B']
    for col_num, value in enumerate(df3.columns.values):
        worksheet.write(0, col_num + 1, value, header_format)
    
    ...
    

    You can think about packing the the formatting in a loop and iterate over all worksheets, when you get too much dataframes.