I am using Pandas groupby to get the monthly top n items in each year.
month_gr = df.groupby(by=[df.index.year, df.index.month_name(), df['Item Name']])
month_gr['Total'].sum().groupby(level=[0,1], group_keys=False).nlargest(5).sort_index(level=1)
which gives me the output as:
Order Datee Order Datee Item Name
2020 August 12oz w/ lids 10097.50
8oz cup / lids 10246.50
Full fat Milk 32507.00
Grilled Chic WRAP 94166.58
Special Blend Beans 81855.00
July 8oz cup / lids 4801.50
Arwa500ml 6700.41
Full fat Milk 13430.00
Spanish Latte ( R ) 6480.00
Special Blend 500g 29880.00
June Full fat Milk 4740.00
MANAEESH CHEESE 3576.24
Marble cake 4810.65
NUTELLA CHEESECAKE 3350.90
Special Blend Beans 5652.00
September CLUB SANDWICH 1040.10
Cappuccino (Regular) 1404.80
Flat White (Regular) 1162.40
Ginger shot big 2016.00
Spanish Latte ( R ) 926.40
Name: Total, dtype: float64
If I use the sort_index(level=1), it sorts the values using the alphabetical order giving me the same output. However, I want to sort it with the Monthly orders as below:
cats = ['January', 'February', 'March', 'April','May','June', 'July', 'August','September', 'October', 'November', 'December']
I have found one solution which is using pd.CategoricalIndex to sort a Single index Series based on the months, but I don't know how to use it for multi-index.
Please if you can explain how can I sort the above data based on Month (level 1) or more specifically sort it on year & month (level 0 & 1).
An example with shorted DataFrame.
df = pd.DataFrame({
'year': [2020, 2020, 2020, 2020, 2020, 2020],
'month_name': ['August', 'August', 'August', 'July', 'July', 'September'],
'Item Name': ['a', 'b', 'c', 'd', 'e', 'f'],
'Total': [1, 2, 3, 4, 5, 6]
})
month_gr = df.groupby(by=['year', 'month_name', 'Item Name'])['Total'].sum()
print(month_gr)
Prints:
year month_name Item Name
2020 August a 1
b 2
c 3
July d 4
e 5
September f 6
Name: Total, dtype: int64
Then you can reset index, set categorical columns, sort values and set index back:
month_gr = month_gr.reset_index()
cats = ['January', 'February', 'March', 'April','May','June', 'July', 'August','September', 'October', 'November', 'December']
month_gr['month_name'] = pd.Categorical(month_gr['month_name'], cats, ordered=True)
print(month_gr.sort_values(by=['year', 'month_name']).set_index(['year', 'month_name', 'Item Name']))
Prints:
Total
year month_name Item Name
2020 July d 4
e 5
August a 1
b 2
c 3
September f 6