Search code examples
python-3.xpandashttpresponseexport-to-excel

Python-Django setting Multi-Index of Pandas DataFrame doesn't group/merge last index


When setting index of a pandas DataFrame, the last element of the columns array does not merge/group items together.

Assuming the following test data:

test_data = {
    "desk": ["DESK1", "DESK2", "DESK3", "DESK4", "DESK5", "DESK6", "DESK7", "DESK8", "DESK9", "DESK10"],
    "phone": ["111-1111", "111-1111", "111-1111", "111-1111", "444-4444", "444-4444", "111-1111", "111-1111", "123-4567", "123-4567"],
    "email": ["Desk1_Email@email.com", "Desk1_Email@email.com", "Desk3_Email@email.com", "Desk4_Email@email.com", "Desk5@email.com", "Desk5@email.com", "Desk7@email.com", "Desk8@email.com", "Desk9@email.com", "Desk10@email.com"],
    "team1": ["Adam", "xxxx", "Tiana", "", "Gina", "Gina", "Ruby", "Becca", "John", ""],
    "team2": ["", "", "Dime", "", "Ed", "", "", "", "Fa", "Tim"],
}

Created a DataFrame:

import io
import pandas as pd
from django.http.response import HttpResponse
from rest_framework import status

### Create DataFrame from test_data
df = pd.DataFrame(test_data)

Then try to write & return the file

### Write & return the file
with io.BytesIO() as buffer:
    with pd.ExcelWriter(buffer) as writer:
        df: pd.DataFrame = df
        groupby_columns = ['desk', 'phone', 'email']
        df.set_index(groupby_columns, inplace=True, drop=True, append=False )
        df.to_excel(writer, index=True, sheet_name="Team Matrix", merge_cells=True)

        return HttpReponse(
            buffer.getvalue(),
            headers={
                "Content-Type": "application/vnd.openxmlformats-" "officedocument.spreadsheetml.sheet",
                "Content-Disposition": "attachment; filename=excel-export.xlsx",
            },
            status=status.HTTP_201_CREATED,
        )

Which returns the following file: Returned Excel File

But what I want is for the first three columns (desk, phone, email) to be merge if same data, which with the above code it does for desk and phone column but phone column doesn't group/merge like the other two. Desired Returned Excel File


Solution

  • One possible solution, put empty ("") values to cells you want and then merge the cells:

    This will create a new dataframe with empty cells:

    def fn(x):
        x.loc[x.index[0] + 1 :, ["desk", "phone", "email"]] = ""
        return x
    
    
    empty_rows = df.loc[:, ["team1", "team2"]].eq("").all(axis=1)
    groups = ((df["email"] != df["email"].shift()) | empty_rows).cumsum()
    df = df.groupby(groups, group_keys=False).apply(fn)
    

    Prints:

         desk     phone                  email  team1 team2
    0   DESK1  111-1111  Desk1_Email@email.com   Adam      
    1                                            xxxx      
    2   DESK3  111-1111  Desk2_Email@email.com  Tiana  Dime
    3   DESK4  111-1111  Desk2_Email@email.com             
    4   DESK5  444-4444     my_email@email.com   Gina    Ed
    5                                            Gina      
    6   DESK7  111-1111         MagicSchoolbus   Ruby      
    7   DESK8  111-1111        Desk8@email.com  Becca      
    8   DESK9  123-4567        Desk9@email.com   John    Fa
    9  DESK10  123-4567       Desk10@email.com          Tim
    

    This step will merge the first 3 columns in excel:

    def merge_fn(g):
        if len(g) == 1:
            return
        first, last = g.index[0] + 1, g.index[-1] + 1
        worksheet.merge_range(first, 0, last, 0, g.iat[0, 0], merge_format)
        worksheet.merge_range(first, 1, last, 1, g.iat[0, 1], merge_format)
        worksheet.merge_range(first, 2, last, 2, g.iat[0, 2], merge_format)
    
    
    writer = pd.ExcelWriter("out.xlsx", engine="xlsxwriter")
    
    df.to_excel(writer, sheet_name="Team Matrix", index=False)
    workbook = writer.book
    worksheet = writer.sheets["Team Matrix"]
    merge_format = workbook.add_format({"align": "left", "valign": "top", "border": 0})
    df.groupby(groups, group_keys=False).apply(merge_fn)
    
    writer.close()
    

    Creates out.xlsx (screenshot from LibreOffice):

    enter image description here