Search code examples
pythonpandasmulti-index

adding new column to pandas multi-index dataframe using rolling().max()


I have the following dataframe with multi-index:

dates = pd.date_range(start='2016-01-01 09:30:00', periods=20, freq='s')
df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
               'B': np.concatenate((dates, dates)),
             'C': np.arange(40)})
df = df.set_index(["B","A"])

Now I want to create a new columns being the maximum value of the last two values for index A. I tried the following:

df.loc[:,"D"] = df.groupby(level="A").rolling(2).max()

But it only produces N/A for the new column ("D), since the order of the grouped dataframe index is the opposite of the original dataframe.

How can I solve this? I prefer to stay away from stacking/unstacking, swaplevel/sortlevel, join or concat since I have a big dataframe and these operations tend to be quite time consuming.


Solution

  • You need reset_index with drop parameter for remove first level of MultiIndex:

    df['D'] = df.groupby(level="A")['C'].rolling(2).max().reset_index(level=0, drop=True)
    print (df)
                            C     D
    B                   A          
    2016-01-01 09:30:00 1   0   NaN
    2016-01-01 09:30:01 1   1   1.0
    2016-01-01 09:30:02 1   2   2.0
    2016-01-01 09:30:03 1   3   3.0
    2016-01-01 09:30:04 1   4   4.0
    2016-01-01 09:30:05 1   5   5.0
    2016-01-01 09:30:06 1   6   6.0
    2016-01-01 09:30:07 1   7   7.0
    2016-01-01 09:30:08 1   8   8.0
    2016-01-01 09:30:09 1   9   9.0
    2016-01-01 09:30:10 1  10  10.0
    2016-01-01 09:30:11 1  11  11.0
    2016-01-01 09:30:12 1  12  12.0
    2016-01-01 09:30:13 1  13  13.0
    2016-01-01 09:30:14 1  14  14.0
    2016-01-01 09:30:15 1  15  15.0
    2016-01-01 09:30:16 1  16  16.0
    2016-01-01 09:30:17 1  17  17.0
    2016-01-01 09:30:18 1  18  18.0
    2016-01-01 09:30:19 1  19  19.0
    2016-01-01 09:30:00 2  20   NaN
    2016-01-01 09:30:01 2  21  21.0
    ...
    ...
    

    because:

    print (df.groupby(level="A")['C'].rolling(2).max())
    A  B                    A
    1  2016-01-01 09:30:00  1     NaN
       2016-01-01 09:30:01  1     1.0
       2016-01-01 09:30:02  1     2.0
       2016-01-01 09:30:03  1     3.0
       2016-01-01 09:30:04  1     4.0
       2016-01-01 09:30:05  1     5.0
       2016-01-01 09:30:06  1     6.0
       2016-01-01 09:30:07  1     7.0
       2016-01-01 09:30:08  1     8.0
       2016-01-01 09:30:09  1     9.0
       2016-01-01 09:30:10  1    10.0
       2016-01-01 09:30:11  1    11.0
       2016-01-01 09:30:12  1    12.0
       2016-01-01 09:30:13  1    13.0
       2016-01-01 09:30:14  1    14.0
       2016-01-01 09:30:15  1    15.0
       2016-01-01 09:30:16  1    16.0
       2016-01-01 09:30:17  1    17.0
       2016-01-01 09:30:18  1    18.0
       2016-01-01 09:30:19  1    19.0
    2  2016-01-01 09:30:00  2     NaN
       2016-01-01 09:30:01  2    21.0
    ...
    ...