Search code examples
pythonpandasdataframemulti-index

Calculate new MultiIndex level from existing MultiIndex level values


For a DataFrame with two MultiIndex levels age and yearref, the goal is to add a new MultiIndex level yearconstr calculated as yearconstr = yearref - age.

import pandas as pd

df = pd.DataFrame({"value": [1, 2, 3]}, 
                  index=pd.MultiIndex.from_tuples([(10, 2015), (3, 2015), (2, 2016)], 
                                                  names=["age", "yearref"]))
print(df)

# input df:
             value
age yearref       
10  2015         1
3   2015         2
2   2016         3

We could reset the index, calculate a new column and then put the original index back in place plus the newly defined column, but surely there must be a better way.

df = (df.reset_index()
        .assign(yearconstr=lambda df: df.yearref - df.age)
        .set_index(list(df.index.names) + ["yearconstr"]))

print(df)

# expected result:
                        value
age yearref yearconstr       
10  2015    2005            1
3   2015    2012            2
2   2016    2014            3

Solution

  • For a concise and straight-forward approach, we can make use of

    • eval to generate a new Series calculated from the existing MultiIndex. This is easy since it treats index levels just like columns: df.eval("yearref - age")
    • rename the new Series
    • set_index to append the Series to df using the append=True argument.

    Putting everything together:

    df.set_index(df.eval("yearref - age").rename("yearconstr"), append=True)
    
    # result:
                            value
    age yearref yearconstr       
    10  2015    2005            1
    3   2015    2012            2
    2   2016    2014            3