python pandas dataframe sum pandas-groupby

Pandas: Group by and conditional sum based on value of current row

My dataframe looks like this:

customer_nr	order_value	year_ordered	payment_successful
1	50	1980	1
1	75	2017	0
1	10	2020	1
2	55	2000	1
2	300	2007	1
2	15	2010	0

I want to know the total amount a customer has successfully paid in the years before, for a specific order.

The expected output is as follows:

customer_nr	order_value	year_ordered	payment_successful	total_successfully_previously_paid
1	50	1980	1	0
1	75	2017	0	50
1	10	2020	1	50
2	55	2000	1	0
2	300	2007	1	55
2	15	2010	0	355

Closest i've gotten is this:

df.groupby(['customer_nr', 'payment_successful'], as_index=False)['order_value'].sum()

That just gives me the summed amount successfully and unsuccessfully paid all time per customer. It doesn't account for selecting only previous orders to participate in the sum.

Solution

Try:

df["total_successfully_previously_paid"] = (df["payment_successful"].mul(df["order_value"])
                                                                    .groupby(df["customer_nr"])
                                                                    .transform(lambda x: x.cumsum().shift().fillna(0))
                                            )

>>> df
   customer_nr  ...  total_successfully_previously_paid
0            1  ...                                 0.0
1            1  ...                                50.0
2            1  ...                                50.0
3            2  ...                                 0.0
4            2  ...                                55.0
5            2  ...                               355.0

[6 rows x 5 columns]