I have a DataFrame that looks like this table:
index | x | y | value_1 | cumsum_1 | cumsum_2 |
---|---|---|---|---|---|
0 | 0.1 | 1 | 12 | 12 | 0 |
1 | 1.2 | 1 | 10 | 12 | 10 |
2 | 0.25 | 1 | 7 | 19 | 10 |
3 | 1.0 | 2 | 3 | 0 | 3 |
4 | 0.72 | 2 | 5 | 5 | 10 |
5 | 1.5 | 2 | 10 | 5 | 13 |
So my aim is to calculate the cumulative sum of value_1
. But there are two conditions that must be taken into account.
x
is less than 1 the cumsum()
is written in column cumsum_1
and if x
is greater in column cumsum_2
.y
indicates groups (1,2,3,...). When the value in y
changes, the cumsum()
-operation start all over again. I think the grouby()
method would help.Does somebody have any idea?
You can use .where()
on conditions x
< 1 or x
>= 1 to temporarily modify the values of value_1
to 0 according to the condition and then groupby cumsum, as follows:
The second condition is catered by the .groupby
function while the first condition is catered by the .where()
function, detailed below:
.where()
keeps the column values when the condition is true and change the values (to 0 in this case) when the condition is false. Thus, for the first condition where column x
< 1, value_1
will keep its values for feeding to the subsequent cumsum
step to accumulate the filtered values of value_1
. For rows where the condition x
< 1 is False, value_1
has its values masked to 0. These 0 passed to cumsum
for accumulation is effectively the same effect as taking out the original values of value_1
for the accumulation into
column cumsum_1
.
The second line of codes accumulates value_1
values to column cumsum_2
with the opposite condition of x
>= 1. These 2 lines of codes, in effect, allocate value_1
to cumsum_1
and cumsum_2
according to x
< 1 and x
>= 1, respectively.
(Thanks for the suggestion of @tdy to simplify the codes)
df['cumsum_1'] = df['value_1'].where(df['x'] < 1, 0).groupby(df['y']).cumsum()
df['cumsum_2'] = df['value_1'].where(df['x'] >= 1, 0).groupby(df['y']).cumsum()
Result:
print(df)
x y value_1 cumsum_1 cumsum_2
0 0.10 1 12 12 0
1 1.20 1 10 12 10
2 0.25 1 7 19 10
3 1.00 2 3 0 3
4 0.72 2 5 5 3
5 1.50 2 10 5 13