Search code examples
pythonpandaspandas-groupbypandas-rolling

Pandas groupby rolling for future values


I am trying to use a the pandas rolling function with window size 2 with groupby. This would be pretty standard other than that I also want the window to include the current value and the proceeding value.

Specifically, given

df = pd.DataFrame({'groups':['a','a','a','a','a','b','b','b','b','b'], 
                   'info': [i for i in range(10)]})

I want

pd.DataFrame({'groups':['a','a','a','a','a','b','b','b','b','b'], 
                   'info': [i for i in range(10)],
                   'groupsum':[1, 3, 5, 7, nan,  11, 13, 15, 17, nan]})

I have tried 2 strategies, both of which did not work. I first tried

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=2)
df['groupsum'] = df.groupby('groups')['info'].rolling(window=indexer).mean().values

This way leads to kernel crashing, even for this toy dataframe. Very curious as to why.
My second way is to reverse the dataframe then use a regular groupby rolling operation:

df = df.iloc[::-1].copy()
df.index = range(df.shape[0])
df['groupsum'] = df.groupby('groups')['info'].rolling(2).sum().values

While the kernel does not crash with this method, it does not yield the dataframe I'd hoped for; it yields

pd.DataFrame({'groups':['a','a','a','a','a','b','b','b','b','b'], 
                   'info': [i for i in range(10)],
                   'groupsum':[nan,  7.,  5.,  3.,  1., nan, 17., 15., 13., 11.]})

I suppose there is an obvious solution here that I just don't know. Any help is appreciated!


Solution

  • If assign numpy array instead Series it not aligned correctly, never do it for avoid this problems. Need remove first level of MultiIndex by Series.reset_index with drop=True and then change order by indexing:

    indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=2)
    df['groupsum'] = df.groupby('groups')['info'].rolling(window=indexer).sum().reset_index(level=0, drop=True)
    print (df)
      groups  info  groupsum
    0      a     0       1.0
    1      a     1       3.0
    2      a     2       5.0
    3      a     3       7.0
    4      a     4       NaN
    5      b     5      11.0
    6      b     6      13.0
    7      b     7      15.0
    8      b     8      17.0
    9      b     9       NaN
    

    df = df.iloc[::-1].copy()
    df.index = range(df.shape[0])
    df['groupsum'] = df.groupby('groups')['info'].rolling(2).sum().reset_index(level=0, drop=True)
    df = df.iloc[::-1]
    print (df)
      groups  info  groupsum
    9      a     0       1.0
    8      a     1       3.0
    7      a     2       5.0
    6      a     3       7.0
    5      a     4       NaN
    4      b     5      11.0
    3      b     6      13.0
    2      b     7      15.0
    1      b     8      17.0
    0      b     9       NaN