Search code examples
pythonexcelopenpyxl

Merge vertical cells using openpyxl


I'm trying to create a python code to standardize excel files using openpyxl. The code is almost ready, but I'm having a lot of issues when I try to make two things:

  1. In column A & B from row #7. Every single continuous vertical cell with the same content/information (non-empty) needs to be merge into just 1 big cell. For example: it can be 5 continuous vertical cells with the same content and I need to have just 1 big merge cell.

  2. In column C, D & E from row #7. I would also want to merge empty cells with the first non-empty cell above them.

Here is the code I have so far.

from openpyxl import load_workbook
from openpyxl.utils import column_index_from_string, get_column_letter
from openpyxl.styles import Alignment

# Load the Excel file
workbook = load_workbook('test.xlsx')

# Get the list of existing sheet names
sheet_names = workbook.sheetnames

# Iterate through the sheets in reverse order to avoid index issues
for sheet_name in reversed(sheet_names):
    sheet = workbook[sheet_name]
    
    # Check if the sheet name is not "AVL Report"
    if sheet_name != "AVL Report":
        # Remove the sheet
        workbook.remove(sheet)
    else:
        worksheet = workbook["AVL Report"]
        worksheet.title = "BOM"

# Get the modified sheet
modified_sheet = workbook['BOM']

# Define the columns to delete
columns_to_delete = ['C', 'D', 'G', 'H', 'J', 'M', 'N', 'O', 'P', 'Q', 'R', 'S']

# Iterate through the columns in reverse order to avoid index issues
for column in reversed(columns_to_delete):
    # Get the column index
    column_index = column_index_from_string(column)
    
    # Delete the column
    modified_sheet.delete_cols(column_index)

# Save the modified workbook
workbook.save('test_modified.xlsx')
print('Done')

Solution

  • I have merged cell ranges based on the value by processing the data in a sheet and creating a dictionary which contains the necessary merges; cell value and range then applying the merges from the merge dictionary.
    The example code below is a modification of this to suit Criteria 1 and 2 in your question.
    As noted in the first sentence there are 2 sections;
    Section 1 creates the merge dictionary merge_dict. This section also consists of two sections to properly create the dictionary for the two criteria. The determination between the two criteria is by none_list where it contains the Criteria2 columns, ['C', 'D', 'E'].
    The merge dictionary is a dictionary of dictionaries with dictionary values as a list. The column letter as the main key, and the cell value is a sub key in each column. The values for each sub key are the cell row numbers that contain that cell value.
    Example Key A:

    {A: {
         'Merge1': [7, 8, 9, 10, 11], 
         'Merge10': [12], 
         'Merge2': [13, 17, 18, 20, 21, 22, 23, 24], 
         'Merge4': [14], 
         'Merge5': [15, 16], 
         'Merge6': [19]}
    }
    

    From the example; the column is A and cells with the value are shown as sub keys
    i.e. 'Merge1' is in rows 7, 8, 9, 10, 11.

    For columns C, D & E;

    {E: {
         'Merge40': [7, 8], 
         'Merge41': [9, 10, 11, 12, 13], 
         'Merge42': [14, 15, 16, 17, 18, 19, 20], 
         'Merge43': [21, 22, 23, 24]}
    }
    

    The layout is the same however in this case the values are the cells below the sub key that are equal to None until the next cell with a value.

    Section 2 Extracts the row ranges and combines with the column letter to create the merge across the range.
    It ignores sub keys with only one value since a minimum of two cells are needed to merge.
    Where the sub key exists more than once in the column these ranges are split into contiguous ranges.
    For example in column A the sub key 'Merge2' has a list like [13, 17, 18, 20, 21, 22, 23, 24] there are two breaks in the range sequence between 13 & 17 and between 18 & 20 indicating there are three groups of cells to be merged. This section separates these groups, again dropping single cells. Therefore the result is a merge of A17-A18 and A20-A24 with A13 being dropped as a single cell.

    import more_itertools as mit  # pip install more_itertools
    from openpyxl import load_workbook
    from openpyxl.styles import NamedStyle, Font, Alignment
    
    
    def merge_cells(cl, fr, lr):  # cl = column letter, fr = first row, lr = last row
        ### Merge cells gives column letter and first and last row of the range
        cell_range = f'{cl}{fr}:{cl}{lr}'
        worksheet.merge_cells(cell_range)
        worksheet[f'{cl}{fr}'].style = merge_style
    
    
    def group_cell_ranges(cell_list):
        ### Use more itertools to group consecutive lists of rows.
        ret_list = []
        for grp in mit.consecutive_groups(cell_list):
            grp_list = list(grp)
            if len(grp_list) > 1:  # Only add groups with more than one cell
                ret_list.append([grp_list[0], grp_list[-1]])
    
        return ret_list
    
    
    ### Named style to center the value in the merged cells
    merge_style = NamedStyle(name="merge_style")
    merge_style.font = Font(bold=True)
    merge_style.alignment = Alignment(horizontal='center', vertical='center')
    
    ### Load workbook and worksheet
    workbook = load_workbook('test.xlsx')
    worksheet = workbook["AVL Report"]
    
    workbook.add_named_style(merge_style)  # Add named style to the workbook
    
    merge_dict = {}  # Dictionary for merged cells
    last_value_dict = {}
    
    none_list = ['C', 'D', 'E']  # Columns to merge none values per Criteria 2
    
    ### Section1 create 'merge_dict', contains the merge lists from the sheet ###
    ### Iterate from row 7 all columns to max row
    for row in worksheet.iter_rows(min_row=7):
        for cell in row:
            column_letter = cell.column_letter
            c_value = cell.value
            c_row = cell.row
    
            if column_letter not in merge_dict:
                merge_dict[column_letter] = {}
    
            ### Columns under Criteria 1 and 2 need slightly different processing
            ### The none_list separates how the columns are processed
            if column_letter in none_list:  # Cells
                if c_value is not None:
                    last_value_dict[column_letter] = c_value
                else:
                    if column_letter in last_value_dict:
                        c_value = last_value_dict[column_letter]
    
                if c_value is not None:
                    if c_value not in merge_dict[column_letter]:
                        merge_dict[column_letter][c_value] = [c_row]
                    else:
                        merge_dict[column_letter][c_value] += [c_row]
            else:
                if c_value is None:  # Don't merge empty cells, remove if this is required
                    continue
    
                if c_value not in merge_dict[column_letter]:
                    merge_dict[column_letter][c_value] = [c_row]
                else:
                    merge_dict[column_letter][c_value] += [c_row]
    
    ### Section2, Add merges to created in merged_dict to the sheet ###
    for merge_column, interim_dict in merge_dict.items():
        print(f"Merging cells in column {merge_column}!")
        for m_cell_value, like_cell_val_list in interim_dict.items():
            m_size = len(like_cell_val_list)
            if m_size < 2:
                ### Ignore single cells
                print(f"The cell range consists of a single cell for cell value '{m_cell_value}', "
                      f"{merge_column}{like_cell_val_list[0]}")
                continue
    
            first_row, last_row = like_cell_val_list[0], like_cell_val_list[-1]
            diff = last_row - first_row + 1
            ### Range for the cell value is contiguous
            if diff == m_size:
                print(f"The cell range is contiguous for cell value: '{m_cell_value}', {merge_column}{like_cell_val_list}")
                merge_cells(merge_column, first_row, last_row)
            else:
                ### Range for the cell value is broken
                ### Attempt to group contiguous ranges and drop single cells
                print(f"The cell range is not contiguous for cell value '{m_cell_value}', "
                      f"{merge_column}{like_cell_val_list}")
                print(f"Try to group '{m_cell_value}' ranges.")
                grouped_list = group_cell_ranges(like_cell_val_list)
                if len(grouped_list) >= 1:
                    print(f"Groups for: '{m_cell_value}', {merge_column}{grouped_list}")
                else:
                    print(f"No groups for: '{m_cell_value}' in column {merge_column}")
    
                for group in grouped_list:
                    print(f"Grouped cells for cell value: '{m_cell_value}', {merge_column}{group}")
                    first_row, last_row = group[0], group[-1]
                    merge_cells(merge_column, first_row, last_row)
    
        print('')
    
    workbook.save('test_out.xlsx')
    

    The code is set to ignore empty cells in A & B so these remain unmerged.
    e.g. B10-B12 and B22-B24
    Code also set to ignore empty cells that do not have a cell with a value above them in C, D & E so these also remain unmerged.
    e.g D7-D8

    Example before and after sheet;
    The column sizing and cell border is for highlighting merged cells, not part of the code. Example of sheet before and after