I have a large multi-index dataframe with about 3 million data points. The first index is the date, the second is the name of the object. Then, each object has a value like so:
df = pd.DataFrame({'year': [2012, 2012, 2012, 2013, 2013, 2013],
'item':['apples', 'metals', 'water', 'apples', 'metals', 'water'],
'value': [10, 14, 16, 17, 13, 14]})
df.set_index(['year', 'item'], inplace = True)
This is a smaller example of the dataframe I'm working with. I want to return a dataframe that keeps the exact same format, but only has the top two values from each year.
So the final dataframe should output metals and water from 2012 with their value, and apples and water from 2013 with their respective value, and in the same format as the original dataframe.
IIUC sort_values
before groupby
tail
df.sort_values('value').groupby(level=0).tail(2).sort_index()
Out[258]:
value
year item
2012 metals 14
water 16
2013 apples 17
water 14