When setting index of a pandas DataFrame, the last element of the columns array does not merge/group items together.
Assuming the following test data:
test_data = {
"desk": ["DESK1", "DESK2", "DESK3", "DESK4", "DESK5", "DESK6", "DESK7", "DESK8", "DESK9", "DESK10"],
"phone": ["111-1111", "111-1111", "111-1111", "111-1111", "444-4444", "444-4444", "111-1111", "111-1111", "123-4567", "123-4567"],
"email": ["Desk1_Email@email.com", "Desk1_Email@email.com", "Desk3_Email@email.com", "Desk4_Email@email.com", "Desk5@email.com", "Desk5@email.com", "Desk7@email.com", "Desk8@email.com", "Desk9@email.com", "Desk10@email.com"],
"team1": ["Adam", "xxxx", "Tiana", "", "Gina", "Gina", "Ruby", "Becca", "John", ""],
"team2": ["", "", "Dime", "", "Ed", "", "", "", "Fa", "Tim"],
}
Created a DataFrame:
import io
import pandas as pd
from django.http.response import HttpResponse
from rest_framework import status
### Create DataFrame from test_data
df = pd.DataFrame(test_data)
Then try to write & return the file
### Write & return the file
with io.BytesIO() as buffer:
with pd.ExcelWriter(buffer) as writer:
df: pd.DataFrame = df
groupby_columns = ['desk', 'phone', 'email']
df.set_index(groupby_columns, inplace=True, drop=True, append=False )
df.to_excel(writer, index=True, sheet_name="Team Matrix", merge_cells=True)
return HttpReponse(
buffer.getvalue(),
headers={
"Content-Type": "application/vnd.openxmlformats-" "officedocument.spreadsheetml.sheet",
"Content-Disposition": "attachment; filename=excel-export.xlsx",
},
status=status.HTTP_201_CREATED,
)
Which returns the following file: Returned Excel File
But what I want is for the first three columns (desk, phone, email) to be merge if same data, which with the above code it does for desk and phone column but phone column doesn't group/merge like the other two. Desired Returned Excel File
One possible solution, put empty (""
) values to cells you want and then merge the cells:
This will create a new dataframe with empty cells:
def fn(x):
x.loc[x.index[0] + 1 :, ["desk", "phone", "email"]] = ""
return x
empty_rows = df.loc[:, ["team1", "team2"]].eq("").all(axis=1)
groups = ((df["email"] != df["email"].shift()) | empty_rows).cumsum()
df = df.groupby(groups, group_keys=False).apply(fn)
Prints:
desk phone email team1 team2
0 DESK1 111-1111 Desk1_Email@email.com Adam
1 xxxx
2 DESK3 111-1111 Desk2_Email@email.com Tiana Dime
3 DESK4 111-1111 Desk2_Email@email.com
4 DESK5 444-4444 my_email@email.com Gina Ed
5 Gina
6 DESK7 111-1111 MagicSchoolbus Ruby
7 DESK8 111-1111 Desk8@email.com Becca
8 DESK9 123-4567 Desk9@email.com John Fa
9 DESK10 123-4567 Desk10@email.com Tim
This step will merge the first 3 columns in excel:
def merge_fn(g):
if len(g) == 1:
return
first, last = g.index[0] + 1, g.index[-1] + 1
worksheet.merge_range(first, 0, last, 0, g.iat[0, 0], merge_format)
worksheet.merge_range(first, 1, last, 1, g.iat[0, 1], merge_format)
worksheet.merge_range(first, 2, last, 2, g.iat[0, 2], merge_format)
writer = pd.ExcelWriter("out.xlsx", engine="xlsxwriter")
df.to_excel(writer, sheet_name="Team Matrix", index=False)
workbook = writer.book
worksheet = writer.sheets["Team Matrix"]
merge_format = workbook.add_format({"align": "left", "valign": "top", "border": 0})
df.groupby(groups, group_keys=False).apply(merge_fn)
writer.close()
Creates out.xlsx
(screenshot from LibreOffice):