Search code examples
pandasdataframepandas-groupbymulti-index

Pandas groupby where the dataframe has a multilevel column Index


Is there a way to group by a column where the dataframe has multilevel column index?

For example if we have the following data frame:

arrays=[['bar', 'bar', 'baz', 'baz', 'foo', 'foo'],['one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 6), index=['A', 'B', 'C'], columns=index)
df[('foo','two')] = ['L1', 'L1', 'L2']

which looks like:

first        bar                 baz                 foo    
second       one       two       one       two       one two
A      -0.484875 -1.150611  0.661847  0.513653 -0.775554  L1
B      -0.871233 -1.022598 -0.446219  0.306569 -1.031515  L1
C      -0.510137 -0.206838 -0.195791 -0.591447  0.830448  L2

How do I do something like the following?

df.groupby(('foo', 'two'))

The code raises an exception:

    raise ValueError("Grouper for '%s' not 1-dimensional" % t)
ValueError: Grouper for 'foo' not 1-dimensional

Solution

  • As comments suggested it is caused by an outdated pandas module.

    In that version (0.20.3), the following groupby works:

    df.groupby([('foo', 'two')])