I'm trying to create a python code to standardize excel files using openpyxl. The code is almost ready, but I'm having a lot of issues when I try to make two things:
In column A & B from row #7. Every single continuous vertical cell with the same content/information (non-empty) needs to be merge into just 1 big cell. For example: it can be 5 continuous vertical cells with the same content and I need to have just 1 big merge cell.
In column C, D & E from row #7. I would also want to merge empty cells with the first non-empty cell above them.
Here is the code I have so far.
from openpyxl import load_workbook
from openpyxl.utils import column_index_from_string, get_column_letter
from openpyxl.styles import Alignment
# Load the Excel file
workbook = load_workbook('test.xlsx')
# Get the list of existing sheet names
sheet_names = workbook.sheetnames
# Iterate through the sheets in reverse order to avoid index issues
for sheet_name in reversed(sheet_names):
sheet = workbook[sheet_name]
# Check if the sheet name is not "AVL Report"
if sheet_name != "AVL Report":
# Remove the sheet
workbook.remove(sheet)
else:
worksheet = workbook["AVL Report"]
worksheet.title = "BOM"
# Get the modified sheet
modified_sheet = workbook['BOM']
# Define the columns to delete
columns_to_delete = ['C', 'D', 'G', 'H', 'J', 'M', 'N', 'O', 'P', 'Q', 'R', 'S']
# Iterate through the columns in reverse order to avoid index issues
for column in reversed(columns_to_delete):
# Get the column index
column_index = column_index_from_string(column)
# Delete the column
modified_sheet.delete_cols(column_index)
# Save the modified workbook
workbook.save('test_modified.xlsx')
print('Done')
I have merged cell ranges based on the value by processing the data in a sheet and creating a dictionary which contains the necessary merges; cell value and range then applying the merges from the merge dictionary.
The example code below is a modification of this to suit Criteria 1 and 2 in your question.
As noted in the first sentence there are 2 sections;
Section 1 creates the merge dictionary merge_dict
. This section also consists of two sections to properly create the dictionary for the two criteria. The determination between the two criteria is by none_list
where it contains the Criteria2 columns, ['C', 'D', 'E']
.
The merge dictionary is a dictionary of dictionaries with dictionary values as a list. The column letter as the main key, and the cell value is a sub key in each column. The values for each sub key are the cell row numbers that contain that cell value.
Example Key A:
{A: {
'Merge1': [7, 8, 9, 10, 11],
'Merge10': [12],
'Merge2': [13, 17, 18, 20, 21, 22, 23, 24],
'Merge4': [14],
'Merge5': [15, 16],
'Merge6': [19]}
}
From the example; the column is A and cells with the value are shown as sub keys
i.e. 'Merge1'
is in rows 7, 8, 9, 10, 11.
For columns C, D & E;
{E: {
'Merge40': [7, 8],
'Merge41': [9, 10, 11, 12, 13],
'Merge42': [14, 15, 16, 17, 18, 19, 20],
'Merge43': [21, 22, 23, 24]}
}
The layout is the same however in this case the values are the cells below the sub key that are equal to None until the next cell with a value.
Section 2 Extracts the row ranges and combines with the column letter to create the merge across the range.
It ignores sub keys with only one value since a minimum of two cells are needed to merge.
Where the sub key exists more than once in the column these ranges are split into contiguous ranges.
For example in column A the sub key 'Merge2'
has a list like [13, 17, 18, 20, 21, 22, 23, 24]
there are two breaks in the range sequence between 13 & 17
and between 18 & 20
indicating there are three groups of cells to be merged. This section separates these groups, again dropping single cells. Therefore the result is a merge of A17-A18 and A20-A24 with A13 being dropped as a single cell.
import more_itertools as mit # pip install more_itertools
from openpyxl import load_workbook
from openpyxl.styles import NamedStyle, Font, Alignment
def merge_cells(cl, fr, lr): # cl = column letter, fr = first row, lr = last row
### Merge cells gives column letter and first and last row of the range
cell_range = f'{cl}{fr}:{cl}{lr}'
worksheet.merge_cells(cell_range)
worksheet[f'{cl}{fr}'].style = merge_style
def group_cell_ranges(cell_list):
### Use more itertools to group consecutive lists of rows.
ret_list = []
for grp in mit.consecutive_groups(cell_list):
grp_list = list(grp)
if len(grp_list) > 1: # Only add groups with more than one cell
ret_list.append([grp_list[0], grp_list[-1]])
return ret_list
### Named style to center the value in the merged cells
merge_style = NamedStyle(name="merge_style")
merge_style.font = Font(bold=True)
merge_style.alignment = Alignment(horizontal='center', vertical='center')
### Load workbook and worksheet
workbook = load_workbook('test.xlsx')
worksheet = workbook["AVL Report"]
workbook.add_named_style(merge_style) # Add named style to the workbook
merge_dict = {} # Dictionary for merged cells
last_value_dict = {}
none_list = ['C', 'D', 'E'] # Columns to merge none values per Criteria 2
### Section1 create 'merge_dict', contains the merge lists from the sheet ###
### Iterate from row 7 all columns to max row
for row in worksheet.iter_rows(min_row=7):
for cell in row:
column_letter = cell.column_letter
c_value = cell.value
c_row = cell.row
if column_letter not in merge_dict:
merge_dict[column_letter] = {}
### Columns under Criteria 1 and 2 need slightly different processing
### The none_list separates how the columns are processed
if column_letter in none_list: # Cells
if c_value is not None:
last_value_dict[column_letter] = c_value
else:
if column_letter in last_value_dict:
c_value = last_value_dict[column_letter]
if c_value is not None:
if c_value not in merge_dict[column_letter]:
merge_dict[column_letter][c_value] = [c_row]
else:
merge_dict[column_letter][c_value] += [c_row]
else:
if c_value is None: # Don't merge empty cells, remove if this is required
continue
if c_value not in merge_dict[column_letter]:
merge_dict[column_letter][c_value] = [c_row]
else:
merge_dict[column_letter][c_value] += [c_row]
### Section2, Add merges to created in merged_dict to the sheet ###
for merge_column, interim_dict in merge_dict.items():
print(f"Merging cells in column {merge_column}!")
for m_cell_value, like_cell_val_list in interim_dict.items():
m_size = len(like_cell_val_list)
if m_size < 2:
### Ignore single cells
print(f"The cell range consists of a single cell for cell value '{m_cell_value}', "
f"{merge_column}{like_cell_val_list[0]}")
continue
first_row, last_row = like_cell_val_list[0], like_cell_val_list[-1]
diff = last_row - first_row + 1
### Range for the cell value is contiguous
if diff == m_size:
print(f"The cell range is contiguous for cell value: '{m_cell_value}', {merge_column}{like_cell_val_list}")
merge_cells(merge_column, first_row, last_row)
else:
### Range for the cell value is broken
### Attempt to group contiguous ranges and drop single cells
print(f"The cell range is not contiguous for cell value '{m_cell_value}', "
f"{merge_column}{like_cell_val_list}")
print(f"Try to group '{m_cell_value}' ranges.")
grouped_list = group_cell_ranges(like_cell_val_list)
if len(grouped_list) >= 1:
print(f"Groups for: '{m_cell_value}', {merge_column}{grouped_list}")
else:
print(f"No groups for: '{m_cell_value}' in column {merge_column}")
for group in grouped_list:
print(f"Grouped cells for cell value: '{m_cell_value}', {merge_column}{group}")
first_row, last_row = group[0], group[-1]
merge_cells(merge_column, first_row, last_row)
print('')
workbook.save('test_out.xlsx')
The code is set to ignore empty cells in A & B so these remain unmerged.
e.g. B10-B12 and B22-B24
Code also set to ignore empty cells that do not have a cell with a value above them in C, D & E so these also remain unmerged.
e.g D7-D8
Example before and after sheet;
The column sizing and cell border is for highlighting merged cells, not part of the code.