Search code examples
python-3.xpandashttpresponseexport-to-excel

Python-Django setting Multi-Index of Pandas DataFrame doesn't group/merge last index


When setting index of a pandas DataFrame, the last element of the columns array does not merge/group items together.

Assuming the following test data:

test_data = {
    "desk": ["DESK1", "DESK2", "DESK3", "DESK4", "DESK5", "DESK6", "DESK7", "DESK8", "DESK9", "DESK10"],
    "phone": ["111-1111", "111-1111", "111-1111", "111-1111", "444-4444", "444-4444", "111-1111", "111-1111", "123-4567", "123-4567"],
    "email": ["[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]"],
    "team1": ["Adam", "xxxx", "Tiana", "", "Gina", "Gina", "Ruby", "Becca", "John", ""],
    "team2": ["", "", "Dime", "", "Ed", "", "", "", "Fa", "Tim"],
}

Created a DataFrame:

import io
import pandas as pd
from django.http.response import HttpResponse
from rest_framework import status

### Create DataFrame from test_data
df = pd.DataFrame(test_data)

Then try to write & return the file

### Write & return the file
with io.BytesIO() as buffer:
    with pd.ExcelWriter(buffer) as writer:
        df: pd.DataFrame = df
        groupby_columns = ['desk', 'phone', 'email']
        df.set_index(groupby_columns, inplace=True, drop=True, append=False )
        df.to_excel(writer, index=True, sheet_name="Team Matrix", merge_cells=True)

        return HttpReponse(
            buffer.getvalue(),
            headers={
                "Content-Type": "application/vnd.openxmlformats-" "officedocument.spreadsheetml.sheet",
                "Content-Disposition": "attachment; filename=excel-export.xlsx",
            },
            status=status.HTTP_201_CREATED,
        )

Which returns the following file: Returned Excel File

But what I want is for the first three columns (desk, phone, email) to be merge if same data, which with the above code it does for desk and phone column but phone column doesn't group/merge like the other two. Desired Returned Excel File


Solution

  • One possible solution, put empty ("") values to cells you want and then merge the cells:

    This will create a new dataframe with empty cells:

    def fn(x):
        x.loc[x.index[0] + 1 :, ["desk", "phone", "email"]] = ""
        return x
    
    
    empty_rows = df.loc[:, ["team1", "team2"]].eq("").all(axis=1)
    groups = ((df["email"] != df["email"].shift()) | empty_rows).cumsum()
    df = df.groupby(groups, group_keys=False).apply(fn)
    

    Prints:

         desk     phone                  email  team1 team2
    0   DESK1  111-1111  [email protected]   Adam      
    1                                            xxxx      
    2   DESK3  111-1111  [email protected]  Tiana  Dime
    3   DESK4  111-1111  [email protected]             
    4   DESK5  444-4444     [email protected]   Gina    Ed
    5                                            Gina      
    6   DESK7  111-1111         MagicSchoolbus   Ruby      
    7   DESK8  111-1111        [email protected]  Becca      
    8   DESK9  123-4567        [email protected]   John    Fa
    9  DESK10  123-4567       [email protected]          Tim
    

    This step will merge the first 3 columns in excel:

    def merge_fn(g):
        if len(g) == 1:
            return
        first, last = g.index[0] + 1, g.index[-1] + 1
        worksheet.merge_range(first, 0, last, 0, g.iat[0, 0], merge_format)
        worksheet.merge_range(first, 1, last, 1, g.iat[0, 1], merge_format)
        worksheet.merge_range(first, 2, last, 2, g.iat[0, 2], merge_format)
    
    
    writer = pd.ExcelWriter("out.xlsx", engine="xlsxwriter")
    
    df.to_excel(writer, sheet_name="Team Matrix", index=False)
    workbook = writer.book
    worksheet = writer.sheets["Team Matrix"]
    merge_format = workbook.add_format({"align": "left", "valign": "top", "border": 0})
    df.groupby(groups, group_keys=False).apply(merge_fn)
    
    writer.close()
    

    Creates out.xlsx (screenshot from LibreOffice):

    enter image description here