Search code examples
pythonpandasmulti-index

How to group Pandas DataFrame with MultiIndex columns?


I have a MultiIndex column Pandas DataFrame A:

       foo    
       bar   baz
s1_a     1     2
s1_b     3     4
s2_a     5     6
s2_b     7     8

I want to group the data on a key from another DataFrame B:

       key
s1_a     1
s1_b     1
s2_a     2
s2_b     2

For a DataFrame without a MultiIndex, I would do:

pd.merge(A, B, left_index=True, right_index=True).groupby('key').sum()

but this does not work with the MultiIndex. The desired result is

    foo
    bar    baz
1     4      6
2    12     14

How can I achieve this?


Solution

  • Series can be passed to groupby directly and the grouper will do appropriate index alignment, so it is possible to do:

    A.groupby(B['key']).sum()
    
        foo    
        bar baz
    key        
    1     4   6
    2    12  14
    

    Setup:

    import numpy as np
    import pandas as pd
    
    idx = ['s1_a', 's1_b', 's2_a', 's2_b']
    A = pd.DataFrame(
        np.arange(1, 9).reshape((-1, 2)),
        index=idx,
        columns=pd.MultiIndex.from_product([['foo'], ['bar', 'baz']])
    )
    
    B = pd.DataFrame({'key': [1, 1, 2, 2]}, index=idx)
    

    Note this will work in many cases but is less durable than merging:

    B variant 1:

    B = pd.DataFrame({'key': [1, 2, 2]}, index=['s1_a', 's1_b', 's2_b'])
          key
    s1_a    1  # No s2_a
    s1_b    2
    s2_b    2
    
    
    A.groupby(B['key']).sum()
    
        foo    
        bar baz
    key        
    1.0   1   2
    2.0  10  12
    

    B variant 2:

    B = pd.DataFrame({'key': [1, 1, 2, 2]}, index=['s1_a', 's2_a', 's1_b', 's2_b'])
          key
    s1_a    1
    s2_a    1  # s1_a/s2_a together
    s1_b    2
    s2_b    2  # s1_b/s2_b together
    
    A.groupby(B['key']).sum()
    
        foo    
        bar baz
    key        
    1     6   8
    2    10  12