Search code examples
pythonpandasgroup-by

Using group_by in pandas but with condition


I have dataframe

data = {'time': ['10:00', '10:01', '10:02', '10:02', '10:03','10:04', '10:06', '10:10', '10:15'],
        'price': [100, 101, 101, 103, 101,101, 105, 106, 107],
        'volume': [50, 60, 30, 80, 20,50, 10, 40, 40]}

I need to group by this df by every 5 minutes and price, sum up the volume

df.groupby([df['time'].dt.floor('5T'), 'price']).agg({'volume' : 'sum'}).reset_index()

Then i need to find time when pandas groups them where after sum new volume i will get value more than 100.

In this df i find 10:03 and after sum, value will be 60 + 30 + 20 = 110. In 10:04 sum will be 60 + 30 + 20 + 50 = 160

How can i do this using pandas?


Solution

  • It looks like you want the cumulated sum of the volume with groupby.cumsum:

    df['cum_volume'] = (df.groupby([df['time'].dt.floor('5min'), 'price'])
                          ['volume'].cumsum()
                        )
    

    Updated df:

                     time  price  volume  cum_volume
    0 2024-04-06 10:00:00    100      50          50
    1 2024-04-06 10:01:00    101      60          60
    2 2024-04-06 10:02:00    101      30          90
    3 2024-04-06 10:02:00    103      80          80
    4 2024-04-06 10:03:00    101      20         110
    5 2024-04-06 10:04:00    101      50         160
    6 2024-04-06 10:06:00    105      10          10
    7 2024-04-06 10:10:00    106      40          40
    8 2024-04-06 10:15:00    107      40          40
    

    You can then filter based on the value:

    out = df.query('cum_volume > 100')
    

    Output:

                     time  price  volume  cum_volume
    4 2024-04-06 10:03:00    101      20         110
    5 2024-04-06 10:04:00    101      50         160