Trying to calculate in pandas the rolling sum of Values in Column A, with lookback periods given in Column B and results of rolling sums stored in Column C.
Index | Column A | Column B || Column C |
| -------- | -------- || -------- |
0 | 1 | 1 || 1 |
1 | 2 | 2 || 3 |
2 | 1 | 3 || 4 |
3 | 3 | 2 || 4 |
4 | 2 | 4 || 8 |
For example, for last row, the rolling sum should sum the last 4 values from Column A, since 4 is given in column B.
Avoiding loops would be optimal.
Although a simple task, I haven't managed to come up with a solution.
Since your rolling sums depend on all values, you will have to compute one per window. This can be done using numpy and indexing lookup:
import numpy as np
idx, vals = pd.factorize(df['B'])
df['C'] = np.vstack([
df['A'].rolling(v, min_periods=1).sum()
for v in vals
])[idx, np.arange(len(df))]
Output:
A B C
0 1 1 1.0
1 2 2 3.0
2 1 3 4.0
3 3 2 4.0
4 2 4 8.0
Variant if you want to use different aggregation functions at once (here sum
and mean
for the demo, although D could also be computed as C/B):
import numpy as np
idx, vals = pd.factorize(df['B'])
df[['C', 'D']] = np.dstack([
df['A'].rolling(v, min_periods=1)
.agg(['sum', 'mean'])
for v in vals
])[np.arange(len(df)), :, idx]
Output:
A B C D
0 1 1 1.0 1.000000
1 2 2 3.0 1.500000
2 1 3 4.0 1.333333
3 3 2 4.0 2.000000
4 2 4 8.0 2.000000
Reproducible input:
df = pd.DataFrame({'A': [1,2,1,3,2],
'B': [1,2,3,2,4]})