Search code examples
pythonpython-3.xpandasdataframemulti-index

Modify pandas multiindex value for a particular group


Assume I have a pandas MultiIndex DataFrame with three different levels:

arrays = [['2020-03-30', '2020-03-30', '2020-03-30', '2020-04-15', '2020-04-15', '2020-04-15', '2020-05-10', '2020-05-10', '2020-06-10'], ['Firm1', 'Firm1', 'Firm2', 'Firm1', 'Firm2', 'Firm2', 'Firm1', 'Firm1', 'Firm1'], ['2022-01-01', '2023-02-01', '2021-01-05', '2021-04-01', '2022-01-01', '2024-02-01', '2021-02-05', '2022-04-01', '2022-04-01']]
idx = pd.MultiIndex.from_arrays(arrays, names = ('Buy_date', 'Firm', 'Sell_date'))
df = pd.DataFrame(np.zeros(9), index = idx)
df

                               0
Buy_date   Firm  Sell_date      
2020-03-30 Firm1 2022-01-01  0.0
                 2023-02-01  0.0
           Firm2 2021-01-05  0.0
2020-04-15 Firm1 2021-04-01  0.0
           Firm2 2022-01-01  0.0
                 2024-02-01  0.0
2020-05-10 Firm1 2021-02-05  0.0
                 2022-04-01  0.0
2020-06-10 Firm1 2022-04-01  0.0

For a given buy date, I want to change the Maximum Sell_date index value of Firm1 and set it to 01.01.2030. So the output should be:

Buy_date   Firm  Sell_date      
2020-03-30 Firm1 2022-01-01  0.0
                 2030-01-01  0.0
           Firm2 2021-01-05  0.0
2020-04-15 Firm1 2030-01-01  0.0
           Firm2 2022-01-01  0.0
                 2024-02-01  0.0
2020-05-10 Firm1 2021-02-05  0.0
                 2030-01-01  0.0
2020-06-10 Firm1 2030-01-01  0.0

Does anybody have an idea how to do this?


Solution

  • Here's one way using set_index and append a new index level with the required modifications. Also used groupby.transform to get the max dates:

    max_dates = df.reset_index(level='Sell_date').groupby(level=[0,1])['Sell_date'].transform('max')
    df = df.set_index(df.index.get_level_values(2)
                      .where((df.index.get_level_values(1)!='Firm1') | 
                             (df.index.get_level_values(2)!=max_dates), '2030-01-01'), 
                      append=True).droplevel(2)
    

    A more straightforward way is to use to reset_index, modify the DataFrame and set the index with the new columns:

    df1 = df.reset_index()
    max_date = df1.groupby(['Buy_date','Firm'])['Sell_date'].transform('max')
    df1.loc[df1['Sell_date'].eq(max_date) & df1['Firm'].eq('Firm1'), 'Sell_date'] = '2030-01-01'
    df = df1.set_index(['Buy_date','Firm','Sell_date'])
    

    Output:

                                   0
    Buy_date   Firm  Sell_date      
    2020-03-30 Firm1 2022-01-01  0.0
                     2030-01-01  0.0
               Firm2 2021-01-05  0.0
    2020-04-15 Firm1 2030-01-01  0.0
               Firm2 2022-01-01  0.0
                     2024-02-01  0.0
    2020-05-10 Firm1 2021-02-05  0.0
                     2030-01-01  0.0
    2020-06-10 Firm1 2030-01-01  0.0