Search code examples
pythonexcelopenpyxlcellbackground-color

Get correct background color for merged cells in xlsx file


I am currently trying to extract cells' background color from a xlsx file: image

I've tried two ways obtained from other stackoverflow posts:

1)

wb = load_workbook(excel_file, data_only = True)
sh = wb[wb.sheetnames[0]]
rows = sh.max_row 
cols = sh.max_column
bckg_color = np.empty(shape=(rows, cols), dtype=object)
for i in range(1,rows+1):
    for j in range(1,cols+1):
        cell = sh.cell(column=j, row=i)
        color_in_hex = cell.fill.start_color.index
        bckg_color[i-1, j-1] = str(color_in_hex)

pd.DataFrame(bckg_color)

sf = StyleFrame.read_excel(excel_file, read_style=True, use_openpyxl_styles=False, header=None)
bckg_color = StyleFrame(sf.applymap(lambda cell: cell.style.bg_color)).data_df.astype(str)

bckg_color

Both of them give the same result: df screenshot

The expected result was the same color on the 4th row, but it's not because of merged cells in that row. Is there a robust way (expect using bfill on the color dataframe) to get the colors so that the whole row would have the color I see on the screenshot? I suspect it can be done with getting information about merged cells from openpyxl, but I would not like to resort to that.


Solution

  • I don't think you can avoid dealing with merged cells via openpyxl. But it's quite simple, just check whether the cell belongs to a merged range or not:

    for i in range(1,rows+1):
        for j in range(1,cols+1):
            cell = None
    
            for mcl in sh.merged_cells:
                if mcl.min_row <= i <= mcl.max_row and mcl.min_col <= j <= mcl.max_col:
                    cell = mcl.start_cell # that's where the background info is stored
                    break
    
            if cell is None: cell = sh.cell(column=j, row=i)
            color_in_hex = cell.fill.start_color.index
            bckg_color[i-1, j-1] = str(color_in_hex)