I have dataframe
data = {'time': ['10:00', '10:01', '10:02', '10:02', '10:03','10:04', '10:06', '10:10', '10:15'],
'price': [100, 101, 101, 103, 101,101, 105, 106, 107],
'volume': [50, 60, 30, 80, 20,50, 10, 40, 40]}
I need to group by this df by every 5 minutes and price, sum up the volume
df.groupby([df['time'].dt.floor('5T'), 'price']).agg({'volume' : 'sum'}).reset_index()
Then i need to find time when pandas groups them where after sum new volume i will get value more than 100.
In this df i find 10:03 and after sum, value will be 60 + 30 + 20 = 110. In 10:04 sum will be 60 + 30 + 20 + 50 = 160
How can i do this using pandas?
It looks like you want the cumulated sum of the volume with groupby.cumsum
:
df['cum_volume'] = (df.groupby([df['time'].dt.floor('5min'), 'price'])
['volume'].cumsum()
)
Updated df
:
time price volume cum_volume
0 2024-04-06 10:00:00 100 50 50
1 2024-04-06 10:01:00 101 60 60
2 2024-04-06 10:02:00 101 30 90
3 2024-04-06 10:02:00 103 80 80
4 2024-04-06 10:03:00 101 20 110
5 2024-04-06 10:04:00 101 50 160
6 2024-04-06 10:06:00 105 10 10
7 2024-04-06 10:10:00 106 40 40
8 2024-04-06 10:15:00 107 40 40
You can then filter based on the value:
out = df.query('cum_volume > 100')
Output:
time price volume cum_volume
4 2024-04-06 10:03:00 101 20 110
5 2024-04-06 10:04:00 101 50 160