My dataframe looks like this:
CUST_NO | ORDER_AMOUNT | PAYT_CODE | IS_PAYMENT_SUCCESSFUL |
---|---|---|---|
001 | 50 | OR | 1 |
001 | 20 | IC | 0 |
001 | 10 | IC | 1 |
002 | 55 | IC | 1 |
002 | 300 | MR | 1 |
002 | 215 | MR | 0 |
I want to know the total amount a customer has successfully paid all-time, specifically from the payment codes 'OR', 'IC'
. The dataframe is sorted and indexed by order date.
The expected output is shown in the CUMSUM_OR_IC_SUCCESSFUL
column:
CUST_NO | ORDER_AMOUNT | PAYT_CODE | IS_PAYMENT_SUCCESSFUL | CUMSUM_OR_IC_SUCCESSFUL |
---|---|---|---|---|
001 | 50 | OR | 1 | 0 |
001 | 20 | IC | 0 | 50 |
001 | 10 | IC | 1 | 50 |
002 | 55 | IC | 1 | 0 |
002 | 300 | MR | 1 | 55 |
002 | 215 | MR | 0 | 55 |
I already have some code that should work, but it just keeps running until the kernel crashes.
df["CUMSUM_OR_IC_SUCCESSFUL "] = (df.query("PAYT_CODE == ('OR', 'IC')")["IS_PAYMENT_SUCCESSFUL"].mul(df["ORDER_AMOUNT"])
.groupby(df["CUST_NO"])
.transform(lambda x: x.cumsum().shift().fillna(0))
)
Any help is appreciated!
After some experimenting, this one worked:
df["CUMSUM_GUARANTEED_SUCCESSFUL"] = df["ORDER_AMOUNT"].mul(df["PAYMENT_SUCCESSFUL"]).mul(df["PAYT_CODE"].isin(['IC', 'OC'])).groupby(df["CUST_NO"]).transform(lambda x: x.cumsum().shift().fillna(0))}