Search code examples
pythonexcelopenpyxl

Saving Excel document removes the chart


I am using OpenPyXL for processing, there are only 2 lines of code:

workbook = load_workbook(filename='input.xlsx', keep_vba=True, rich_text=True)
# I will do operator here, but there is an error, so I comment this line
workbook.save('output.xlsx')

Then I get the output with less size:

Output

When I try to open output.xlsx, this notification appears:

Error Notification

The chart was removed by Microsoft Excel. Sometimes I cannot open it in Microsoft Excel, but LibreOffice can open it with no error.

Left: input.xlsx, right: output.xlsx. The chart was removed and Microsoft Excel reports an error:

Result

The alert:

enter image description here

Often my input file has ~ 4 sheets, a chart, and 1 sheet has data. My task is to write data to one sheet, I do not care about the others. I want to get one of these solutions:

  1. Allow OpenPyXL to read the charts the right way.
  2. Make OpenPyXL read a specific sheet, and other sheets will be passed.
  3. Avoid OpenPyXL decrease the size of the file (I think OpenPyXL does it for performance).

Using another library/language is the last option.


Solution

  • The problem is that your charts are combination charts. The chart plot area contains a bar chart and a line chart. Each of these charts has one series where the first series, which is the bar chart series, has idx 0 and order 1 while the second series, which is the line chart series, has idx 1 and order 0. The order is the series order in Select Data Source of the chart. There the line chart series is before the bar chart series.

    Seems openpyxl reader does not handle combination charts properly. It reads the first bar chart in plot area and re-indexes it's series. Other possible series of additional charts in plot area stay unchanged for idx and order. And because the reader only finds one chart series in first bar chart in plot area, this series get idx 0 and order 0 then, since it is the only one series known. The following not considered line chart series stays idx 1 and order 0. But then both series in plot area have the same order 0 and that leads to the error while opening in Excel.

    Workaround: Change the series order in Select Data Source of the charts. For left chart "Cost" before "CVs". For right chart "IMP" before "CPC". Then bar chart series has has idx 0 and order 0 and line chart series has has idx 1 and order 1. That works then.

    enter image description here

    If you don't want to change the source file, the following code should also work:

    from openpyxl import load_workbook
    
    wb = load_workbook("./input.xlsx", rich_text=True)
    
    for sheet in wb.worksheets:
        for chart in sheet._charts:
            idx = 0
            order = 0
            for subChart in chart._charts:
                print(subChart)
                for ser in subChart.ser:
                    print(ser.idx)
                    print(ser.order)
                    ser.idx = idx
                    idx += 1
                    ser.order = order
                    order += 1
                    print(ser.idx)
                    print(ser.order)
    
    wb.save("./output.xlsx")
    

    It re-indexes and re-orders all series in all sub-charts for each main-chart in worksheet.