My dataframe looks like this:
customer_nr | order_value | year_ordered | payment_successful |
---|---|---|---|
1 | 50 | 1980 | 1 |
1 | 75 | 2017 | 0 |
1 | 10 | 2020 | 1 |
2 | 55 | 2000 | 1 |
2 | 300 | 2007 | 1 |
2 | 15 | 2010 | 0 |
I want to know the total amount a customer has successfully paid in the years before, for a specific order.
The expected output is as follows:
customer_nr | order_value | year_ordered | payment_successful | total_successfully_previously_paid |
---|---|---|---|---|
1 | 50 | 1980 | 1 | 0 |
1 | 75 | 2017 | 0 | 50 |
1 | 10 | 2020 | 1 | 50 |
2 | 55 | 2000 | 1 | 0 |
2 | 300 | 2007 | 1 | 55 |
2 | 15 | 2010 | 0 | 355 |
Closest i've gotten is this:
df.groupby(['customer_nr', 'payment_successful'], as_index=False)['order_value'].sum()
That just gives me the summed amount successfully and unsuccessfully paid all time per customer. It doesn't account for selecting only previous orders to participate in the sum.
Try:
df["total_successfully_previously_paid"] = (df["payment_successful"].mul(df["order_value"])
.groupby(df["customer_nr"])
.transform(lambda x: x.cumsum().shift().fillna(0))
)
>>> df
customer_nr ... total_successfully_previously_paid
0 1 ... 0.0
1 1 ... 50.0
2 1 ... 50.0
3 2 ... 0.0
4 2 ... 55.0
5 2 ... 355.0
[6 rows x 5 columns]