I have a dataframe that is organised hierarchically. Consider this:
baseval
indexlevel0 indexlevel1 indexlevel2
L0-0 L1-0 L2-0 1
L2-1 1
L2-2 20
L1-1 L2-0 2
L2-1 2
L2-2 10
What I need to do is create a new dataframe, that collapses the intermediate level (indexlevel1) by replacing the corresponding (indexlevel2) with a single value that is the minimum of the two levels that were once contained in indexlevel1.
Probably easier to just show what I mean - the solution to the above example would be (that is, indexlevel0, and 2 are preserved, along with the minimum basevals per-indexlevel2):
minbylevel
indexlevel0 indexlevel2
L0-0 L2-0 1
L2-1 1
L2-2 10
I have not the slightest idea of where to even start with this. All the examples of aggregation etc work from the bottom up.
Here's some test code to create the starting point dataframe.
import pandas as pd
from io import StringIO
testdata = """
indexlevel0,indexlevel1,indexlevel2,baseval
L0-0,L1-0,L2-0,1
L0-0,L1-0,L2-1,1
L0-0,L1-0,L2-2,20
L0-0,L1-1,L2-0,2
L0-0,L1-1,L2-1,2
L0-0,L1-1,L2-2,10
"""
testinput = StringIO(testdata)
data_df = pd.read_csv(testinput, index_col=[0,1,2], header=[0]).sort_index()
print(data_df)
If only one column DataFrame is possible aggregate min
per first and third level of MultiIndex:
df = data_df.groupby(level=[0,2]).min()
print (df)
baseval
indexlevel0 indexlevel2
L0-0 L2-0 1
L2-1 1
L2-2 10
Or if there is multiple columns use DataFrameGroupBy.idxmin
with DataFrame.loc
and MultiIndex.droplevel
:
df = data_df.loc[data_df.groupby(level=[0,2])['baseval'].idxmin()].droplevel(1)
print (df)
baseval
indexlevel0 indexlevel2
L0-0 L2-0 1
L2-1 1
L2-2 10