Search code examples
pythonpandaspandas-merge

How to add sub-total columns to a multilevel columns dataframe?


I've a dataframe with 3 levels of multi index columns:

quarter           Q1                        Q2                        Totals
year              2021        2022           2021         2022                      
                 qty orders  qty orders    qty orders   qty orders   qty orders
month name                                       
January          40  2        5   1         1   2         0 0             46  5
February         20  8        2   3         4   6         0 0             26  17
March            2  10        7   4         3   3         0 0             12  17
Totals           62 20       14   8         8   11        0 0             84  39

After doing a groupy by levels (0,2), I've the following subtotals dataframe:

quarter           Q1           Q2          Totals                     
                 qty orders  qty orders    qty orders  
month name                                       
January          45  3        1   2         46   5     
February         22  10       4   6         26   16     
March            9  14        3   3         12   17   
Totals           76 28        8   11        84   39

I need to insert the second into the first, without upsetting the columns, levels or index so that I get the following dataframe:

quarter       Q1                                   Q2                        Totals
year        2021        2022      Subtotal    2021        2022     Subtotal                 
            qty orders qty orders qty orders qty orders qty orders qty orders qty orders
month name                                       
January     40  2       5   1     45   3       1  2       0  0       1  2     46  5
February    20  8       2   3     22   10      4  6       0  0       4  6     26  16
March       2  10       7   4     9    14      3  3       0  0       3  3     12  17
Totals      62 20      14   8     76   28      8  11      0  0       8  11    84 39

How do I do this?


Solution

  • With your initial dataframe (before groupby):

    import pandas as pd
    
    
    df = pd.DataFrame(
        [
            [40, 2, 5, 1, 1, 2, 0, 0],
            [20, 8, 2, 3, 4, 6, 0, 0],
            [2, 10, 7, 4, 3, 3, 0, 0],
            [62, 20, 14, 8, 8, 11, 0, 0],
        ],
        columns=pd.MultiIndex.from_product(
            [("Q1", "Q2"), ("2021", "2022"), ("qty", "orders")]
        ),
        index=["January", "February", "March", "Totals"],
    )
    

    Here is one way to do it (using product from Python standard library's itertools module, otherwise a nested for-loop is also possible):

    # Add new columns
    for level1, level2 in product(["Q1", "Q2"], ["qty", "orders"]):
        df.loc[:, (level1, "subtotal", level2)] = (
            df.loc[:, (level1, "2021", level2)] + df.loc[:, (level1, "2022", level2)]
        )
    
    # Sort columns
    df = df.reindex(
        pd.MultiIndex.from_product(
            [("Q1", "Q2"), ("2021", "2022", "subtotal"), ("qty", "orders")]
        ),
        axis=1,
    )
    

    Then:

    print(df)
    # Output
               Q1                                      Q2                     \
             2021        2022        subtotal        2021        2022
              qty orders  qty orders      qty orders  qty orders  qty orders   
    January    40      2    5      1       45      3    1      2    0      0   
    February   20      8    2      3       22     11    4      6    0      0   
    March       2     10    7      4        9     14    3      3    0      0   
    Totals     62     20   14      8       76     28    8     11    0      0   
    
    
             subtotal
                  qty orders  
    January         1      2  
    February        4      6  
    March           3      3  
    Totals          8     11