Search code examples
pythonpandassumrolling-computation

Rolling Sum, with periods given from Pandas Column


Trying to calculate in pandas the rolling sum of Values in Column A, with lookback periods given in Column B and results of rolling sums stored in Column C.

Index | Column A | Column B || Column C |
      | -------- | -------- || -------- |
  0   |     1    |    1     ||    1     |
  1   |     2    |    2     ||    3     |
  2   |     1    |    3     ||    4     |
  3   |     3    |    2     ||    4     |
  4   |     2    |    4     ||    8     |

For example, for last row, the rolling sum should sum the last 4 values from Column A, since 4 is given in column B.

Avoiding loops would be optimal.

Although a simple task, I haven't managed to come up with a solution.


Solution

  • Since your rolling sums depend on all values, you will have to compute one per window. This can be done using and indexing lookup:

    import numpy as np
    
    idx, vals = pd.factorize(df['B'])
    
    df['C'] = np.vstack([
        df['A'].rolling(v, min_periods=1).sum()
        for v in vals
    ])[idx, np.arange(len(df))]
    

    Output:

       A  B    C
    0  1  1  1.0
    1  2  2  3.0
    2  1  3  4.0
    3  3  2  4.0
    4  2  4  8.0
    

    Variant if you want to use different aggregation functions at once (here sum and mean for the demo, although D could also be computed as C/B):

    import numpy as np
    
    idx, vals = pd.factorize(df['B'])
    
    df[['C', 'D']] = np.dstack([
        df['A'].rolling(v, min_periods=1)
               .agg(['sum', 'mean'])
        for v in vals
    ])[np.arange(len(df)), :, idx]
    

    Output:

       A  B    C         D
    0  1  1  1.0  1.000000
    1  2  2  3.0  1.500000
    2  1  3  4.0  1.333333
    3  3  2  4.0  2.000000
    4  2  4  8.0  2.000000
    

    Reproducible input:

    df = pd.DataFrame({'A': [1,2,1,3,2],
                       'B': [1,2,3,2,4]})