When setting index of a pandas DataFrame, the last element of the columns array does not merge/group items together.
Assuming the following test data:
test_data = {
"desk": ["DESK1", "DESK2", "DESK3", "DESK4", "DESK5", "DESK6", "DESK7", "DESK8", "DESK9", "DESK10"],
"phone": ["111-1111", "111-1111", "111-1111", "111-1111", "444-4444", "444-4444", "111-1111", "111-1111", "123-4567", "123-4567"],
"email": ["[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]"],
"team1": ["Adam", "xxxx", "Tiana", "", "Gina", "Gina", "Ruby", "Becca", "John", ""],
"team2": ["", "", "Dime", "", "Ed", "", "", "", "Fa", "Tim"],
}
Created a DataFrame:
import io
import pandas as pd
from django.http.response import HttpResponse
from rest_framework import status
### Create DataFrame from test_data
df = pd.DataFrame(test_data)
Then try to write & return the file
### Write & return the file
with io.BytesIO() as buffer:
with pd.ExcelWriter(buffer) as writer:
df: pd.DataFrame = df
groupby_columns = ['desk', 'phone', 'email']
df.set_index(groupby_columns, inplace=True, drop=True, append=False )
df.to_excel(writer, index=True, sheet_name="Team Matrix", merge_cells=True)
return HttpReponse(
buffer.getvalue(),
headers={
"Content-Type": "application/vnd.openxmlformats-" "officedocument.spreadsheetml.sheet",
"Content-Disposition": "attachment; filename=excel-export.xlsx",
},
status=status.HTTP_201_CREATED,
)
Which returns the following file: Returned Excel File
But what I want is for the first three columns (desk, phone, email) to be merge if same data, which with the above code it does for desk and phone column but phone column doesn't group/merge like the other two. Desired Returned Excel File
One possible solution, put empty (""
) values to cells you want and then merge the cells:
This will create a new dataframe with empty cells:
def fn(x):
x.loc[x.index[0] + 1 :, ["desk", "phone", "email"]] = ""
return x
empty_rows = df.loc[:, ["team1", "team2"]].eq("").all(axis=1)
groups = ((df["email"] != df["email"].shift()) | empty_rows).cumsum()
df = df.groupby(groups, group_keys=False).apply(fn)
Prints:
desk phone email team1 team2
0 DESK1 111-1111 [email protected] Adam
1 xxxx
2 DESK3 111-1111 [email protected] Tiana Dime
3 DESK4 111-1111 [email protected]
4 DESK5 444-4444 [email protected] Gina Ed
5 Gina
6 DESK7 111-1111 MagicSchoolbus Ruby
7 DESK8 111-1111 [email protected] Becca
8 DESK9 123-4567 [email protected] John Fa
9 DESK10 123-4567 [email protected] Tim
This step will merge the first 3 columns in excel:
def merge_fn(g):
if len(g) == 1:
return
first, last = g.index[0] + 1, g.index[-1] + 1
worksheet.merge_range(first, 0, last, 0, g.iat[0, 0], merge_format)
worksheet.merge_range(first, 1, last, 1, g.iat[0, 1], merge_format)
worksheet.merge_range(first, 2, last, 2, g.iat[0, 2], merge_format)
writer = pd.ExcelWriter("out.xlsx", engine="xlsxwriter")
df.to_excel(writer, sheet_name="Team Matrix", index=False)
workbook = writer.book
worksheet = writer.sheets["Team Matrix"]
merge_format = workbook.add_format({"align": "left", "valign": "top", "border": 0})
df.groupby(groups, group_keys=False).apply(merge_fn)
writer.close()
Creates out.xlsx
(screenshot from LibreOffice):