I have a dataframe with counts in one column and I would like to assign several cumulative sums of this column at once. I tried the below code but unfortunately it gives me only the last cumulative sum for all columns.
d = pd.DataFrame({'counts':[242,99,2,13,0]})
kwargs = {f"cumulative_{i}" : lambda x: x['counts'].shift(1).rolling(i).sum() for i in range(1,4)}
d.assign(**kwargs)
this is what it gives me
counts cumulative_1 cumulative_2 cumulative_3
0 242 NaN NaN NaN
1 99 NaN NaN NaN
2 2 NaN NaN NaN
3 13 343.0 343.0 343.0
4 0 114.0 114.0 114.0
but I would like to get this
counts cumulative_1 cumulative_2 cumulative_3
0 242 NaN NaN NaN
1 99 242.0 NaN NaN
2 2 99.0 341.0 NaN
3 13 2.0 101.0 343.0
4 0 13.0 15.0 114.0
what can I change to get the above?
Variable i
defined in the lambda has a global scope, and it's not captured in the lambda definition, i.e. it's always evaluated to 3
, the last value when the loop ends. In order to capture i
at definition time, you can define a wrapper function that captures i
for each iteration of the loop and returns the lambda that can infer the correct i
from it's enclosing environment:
def roll_i(i):
return lambda x: x['counts'].shift(1).rolling(i).sum()
kwargs = {f"cumulative_{i}" : roll_i(i) for i in range(1,4)}
d.assign(**kwargs)
counts cumulative_1 cumulative_2 cumulative_3
0 242 NaN NaN NaN
1 99 242.0 NaN NaN
2 2 99.0 341.0 NaN
3 13 2.0 101.0 343.0
4 0 13.0 15.0 114.0