Search code examples
pythonpandassortingseriesmulti-index

Sort a multi-index Series on a particular level using Categorical Index values


I am using Pandas groupby to get the monthly top n items in each year.

month_gr = df.groupby(by=[df.index.year, df.index.month_name(), df['Item Name']])
month_gr['Total'].sum().groupby(level=[0,1], group_keys=False).nlargest(5).sort_index(level=1)

which gives me the output as:

Order Datee  Order Datee  Item Name           
2020         August       12oz w/ lids            10097.50
                          8oz cup / lids          10246.50
                          Full fat Milk           32507.00
                          Grilled Chic WRAP       94166.58
                          Special Blend Beans     81855.00
             July         8oz cup / lids           4801.50
                          Arwa500ml                6700.41
                          Full fat Milk           13430.00
                          Spanish Latte ( R )      6480.00
                          Special Blend 500g      29880.00
             June         Full fat Milk            4740.00
                          MANAEESH CHEESE          3576.24
                          Marble cake              4810.65
                          NUTELLA CHEESECAKE       3350.90
                          Special Blend Beans      5652.00
             September    CLUB SANDWICH            1040.10
                          Cappuccino (Regular)     1404.80
                          Flat White (Regular)     1162.40
                          Ginger shot big          2016.00
                          Spanish Latte ( R )       926.40
Name: Total, dtype: float64

If I use the sort_index(level=1), it sorts the values using the alphabetical order giving me the same output. However, I want to sort it with the Monthly orders as below:

cats = ['January', 'February', 'March', 'April','May','June', 'July', 'August','September', 'October', 'November', 'December']

I have found one solution which is using pd.CategoricalIndex to sort a Single index Series based on the months, but I don't know how to use it for multi-index.

Please if you can explain how can I sort the above data based on Month (level 1) or more specifically sort it on year & month (level 0 & 1).


Solution

  • An example with shorted DataFrame.

    df = pd.DataFrame({
            'year': [2020, 2020, 2020, 2020, 2020, 2020],
            'month_name': ['August', 'August', 'August', 'July', 'July', 'September'],
            'Item Name': ['a', 'b', 'c', 'd', 'e', 'f'],
            'Total': [1, 2, 3, 4, 5, 6]
        })
    
    month_gr = df.groupby(by=['year', 'month_name', 'Item Name'])['Total'].sum()
    print(month_gr)
    

    Prints:

    year  month_name  Item Name
    2020  August      a            1
                      b            2
                      c            3
          July        d            4
                      e            5
          September   f            6
    Name: Total, dtype: int64
    

    Then you can reset index, set categorical columns, sort values and set index back:

    month_gr = month_gr.reset_index()
    
    cats = ['January', 'February', 'March', 'April','May','June', 'July', 'August','September', 'October', 'November', 'December']
    month_gr['month_name'] = pd.Categorical(month_gr['month_name'], cats, ordered=True)
    
    print(month_gr.sort_values(by=['year', 'month_name']).set_index(['year', 'month_name', 'Item Name']))
    

    Prints:

                               Total
    year month_name Item Name       
    2020 July       d              4
                    e              5
         August     a              1
                    b              2
                    c              3
         September  f              6