Search code examples
pythonpandasdataframesumpandas-groupby

Pandas: Group by and conditional sum based on value of current row


My dataframe looks like this:

customer_nr order_value year_ordered payment_successful
1 50 1980 1
1 75 2017 0
1 10 2020 1
2 55 2000 1
2 300 2007 1
2 15 2010 0

I want to know the total amount a customer has successfully paid in the years before, for a specific order.

The expected output is as follows:

customer_nr order_value year_ordered payment_successful total_successfully_previously_paid
1 50 1980 1 0
1 75 2017 0 50
1 10 2020 1 50
2 55 2000 1 0
2 300 2007 1 55
2 15 2010 0 355

Closest i've gotten is this:

df.groupby(['customer_nr', 'payment_successful'], as_index=False)['order_value'].sum()

That just gives me the summed amount successfully and unsuccessfully paid all time per customer. It doesn't account for selecting only previous orders to participate in the sum.


Solution

  • Try:

    df["total_successfully_previously_paid"] = (df["payment_successful"].mul(df["order_value"])
                                                                        .groupby(df["customer_nr"])
                                                                        .transform(lambda x: x.cumsum().shift().fillna(0))
                                                )
    
    >>> df
       customer_nr  ...  total_successfully_previously_paid
    0            1  ...                                 0.0
    1            1  ...                                50.0
    2            1  ...                                50.0
    3            2  ...                                 0.0
    4            2  ...                                55.0
    5            2  ...                               355.0
    
    [6 rows x 5 columns]