Search code examples
pythonpandasaggregationcumsum

Custom aggregation with conditional cumsum


I have a dataframe which looks like this

enter image description here

Depending on the user action (four types of action are possible) I want to accumulate user's amount of money at the given point of time. Actions of type A and B represent user income, actions of type C and D represent user's expense.

In other words, I want to have something like this as a result

enter image description here

User 1 first performs action_A and get 10 as result. Then action_B gives us 10 + 5 = 15. Action_C represents money loss and gives use 15 - 5 = 10. Finally, action_D is the same as C and we get 10 - 15 = -5.

How can I implement this with pandas? I tried custom aggregation using

expanding().apply(agg_func)

but didn't get any satisfying result.

EDIT: code for dataframe creation

ids = [1,1,1,1,2,2]
dates = ['2019-03-07 13:54', '2019-03-07 16:07', '2019-03-10 19:20', '2019-03-10 19:20', '2016-03-07 14:47', '2016-03-09 11:07']
amounts = [10., 5., 5., 15., 2., 4.]
actions = ['action_A', 'action_B', 'action_C', 'action_D', 'action_A', 'action_B']
result = [10, 15, 10, -5, 2, 6]

pd.DataFrame({'user_id': ids, 'start_date': dates, 'amount': amounts, 'action': actions, 'result': result}, index=range(6))

Solution

  • Multiple values by -1 by mask created by Series.isin and Series.mask and last use GroupBy.cumsum:

    df['result'] = (df['amount'].mask(df['action'].isin(['action_C','action_D']),
                                      df['amount'] * -1)
                               .groupby(df['user'])
                               .cumsum())
    print (df['result'])
    0    10.0
    1    15.0
    2    10.0
    3    -5.0
    4     2.0
    5     6.0
    Name: result, dtype: float64
    

    Similar solution with helper column:

    df['result'] = (df.assign(tmp = df['amount'].mask(df['action'].isin(['action_C','action_D']),
                                     df['amount']*-1))
                      .groupby('user')['tmp']
                      .cumsum())