Search code examples
pandaspandas-groupbynested-listscumulative-sum

Pandas function to group by cumulative sum and return another column when a certain amount is reached


Here it is my problem. I got a dataframe like this:

ID   item   amount   level
1     1      10       5
1     1      10       10
2     4      15       5
2     9      30       8
2     4      10       10
2     4      10       20
3     4      10       4
3     4      10       6

and I need to know, per each id, at what level the cumulative sum of each item reaches a fixed amount. For example, If I need to know the first time when a given items reach an amount of 20 or more for a user. I would like to have something like:

ID   item   amount   level
1     1      10       5
1     1      20       10
2     4      15       5
2     9      30       8
2     4      25       10
2     4      40       20
3     4      10       4
3     4      20       6

and then something like a list or a dictionary in which I can store the results. for example:

d[item_number] = [list_of_levels_per_id_when_20_is_reached]

In this example:

{1: [10], 4: [10,6], 9: [8]}

Solution

  • cumsum

    You can perform the cumsum post group with:

    df['amount_cumsum'] = df.groupby(['ID', 'item'])['amount'].cumsum()
    

    Output (as separate column for clarity):

       ID  item  amount  level  amount_cumsum
    0   1     1      10      5             10
    1   1     1      10     10             20
    2   2     4      15      5             15
    3   2     9      30      8             30
    4   2     4      10     10             25
    5   3     4      10      4             10
    6   3     4      10      6             20
    

    dictionary

    (df[df['amount_cumsum'].ge(20)]
     .groupby(['item'])['level'].agg(list)
     .to_dict()
     )
    

    Output:

    {1: [10], 4: [10, 6], 9: [8]}