Search code examples
pythonpandaspivot-tablemulti-index

Pandas sub- and total of pivot tables with multiindex


I am trying to add a new column with subtotals and a final column with totals. For example,

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
               "B": ["one", "one", "one", "two", "two","one", "one", "two", "two"],
               "C": ["small", "large", "large", "small","small", "large", "small", "small", "large"],
               "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
               "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})

i.e:

     A    B      C  D  E
0  foo  one  small  1  2
1  foo  one  large  2  4
2  foo  one  large  2  5
3  foo  two  small  3  5
4  foo  two  small  3  6
5  bar  one  large  4  6
6  bar  one  small  5  8
7  bar  two  small  6  9
8  bar  two  large  7  9

Now I pivot:

table = pd.pivot_table(df, values=['D',"E"], index=['A'],columns=['C'])

and add the totals:

table['total'] = table.sum(axis=1)
for t in ["D", "E"]:
   table[t, "partial_total"]  = table[t].sum(axis=1)

While this numerically works, visually it's annoying. I would like to have all the data for D (including the partial_total), then E, then the total. Here's my resulting df:

        D               E                total             D             E
C   large     small large     small            partial_total partial_total
A                                                                         
bar   5.5  5.500000   7.5  8.500000  27.000000     11.000000     16.000000
foo   2.0  2.333333   4.5  4.333333  13.166667      4.333333      8.833333

so

how do I group together the values for the same (top level) columns?


Solution

  • Trying to perform operations before the pivot_table

    g = df.groupby(['A', 'C'])[['D', 'E']]
    
    d = (g.sum()/g.count()).reset_index()
    m = d.groupby('A', as_index=False).sum().assign(C='partial')
    
    final = pd.concat([m, d]).pivot_table(index='A', columns='C')
    

            D                          E                     
    C   large     small    partial large     small    partial
    A                                                        
    bar   5.5  5.500000  11.000000   7.5  8.500000  16.000000
    foo   2.0  2.333333   4.333333   4.5  4.333333   8.833333
    

    To answer specifically your last question

    how do I group together the values for the same (top level) columns?

    You may just sort_index

    table.sort_index(axis=1)
    

            D                             E                              total
    C   large partial_total     small large partial_total     small           
    A                                                                         
    bar   5.5     11.000000  5.500000   7.5     16.000000  8.500000  27.000000
    foo   2.0      4.333333  2.333333   4.5      8.833333  4.333333  13.166667