Search code examples
pythonsortingpandasmulti-index

Pandas Custom Sort Row in Multiindex


Given the following:

import pandas as pd
arrays = [['bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz'],
          ['total', 'two', 'one', 'two', 'four', 'total', 'five']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(7), index=index)
s

first  second
bar    total     0.334158
       two      -0.267854
       one       1.161727
baz    two      -0.748685
       four     -0.888634
       total     0.383310
       five      0.506120
dtype: float64

How do I ensure that the 'total' rows (per the second index) are always at the bottom of each group like this?:

first  second
bar    one       0.210911
       two       0.628357
       total    -0.911331
baz    two       0.315396
       four     -0.195451
       five      0.060159
       total     0.638313
dtype: float64

Solution

  • unstack for creating DataFrame with columns with second level of MultiIndex, then reorder columns for total to last column and last use ordered CategoricalIndex.

    So if stack level total is last.

    np.random.seed(123)
    arrays = [['bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz'],
              ['total', 'two', 'one', 'two', 'four', 'total', 'five']]
    tuples = list(zip(*arrays))
    index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
    s = pd.Series(np.random.randn(7), index=index)
    print (s)
    first  second
    bar    total    -1.085631
           two       0.997345
           one       0.282978
    baz    two      -1.506295
           four     -0.578600
           total     1.651437
           five     -2.426679
    dtype: float64
    
    df = s.unstack()
    df = df[df.columns[df.columns != 'total'].tolist() + ['total']]
    df.columns = pd.CategoricalIndex(df.columns, ordered=True)
    print (df)
    second      five    four       one       two     total
    first                                                 
    bar          NaN     NaN  0.282978  0.997345 -1.085631
    baz    -2.426679 -0.5786       NaN -1.506295  1.651437
    
    s1 = df.stack()
    print (s1)
    first  second
    bar    one       0.282978
           two       0.997345
           total    -1.085631
    baz    five     -2.426679
           four     -0.578600
           two      -1.506295
           total     1.651437
    dtype: float64
    
    print (s1.sort_index())
    first  second
    bar    one       0.282978
           two       0.997345
           total    -1.085631
    baz    five     -2.426679
           four     -0.578600
           two      -1.506295
           total     1.651437
    dtype: float64