I have data like this:
price | Date | Time |
---|---|---|
100 | 2021/01/01 | 9:00 |
200 | 2021/01/02 | 9:00 |
112 | 2021/01/01 | 9:01 |
223 | 2021/01/02 | 9:02 |
1145 | 2021/01/01 | 9:02 |
2214 | 2021/01/02 | 9:03 |
11 | 2021/01/01 | 9:03 |
20 | 2021/01/02 | 9:10 |
I need to get 3 values from each day. The price at 9:00, the price at 18:00 (There are more data), and a random value from that day except 9:00 and 18:00. 9:00 is not the start time, and 18:00 is not the end time.
I know I should use groupby for example: df.groupby('Date')['price'] But I don't know how to use conditions to filter data after groupby.
Because I need to use these data of every day, after I filter these data, I also need to get these data. The expected answer is like [100,112,200] (100 is price at 9:00,112 is the random price, 200 is the price at 18:00)
I add some data to your dataframe:
import pandas
from io import StringIO
csv = StringIO("""price,date,time
100,2021/01/01,9:00
200,2021/01/02,9:00
1800,2021/01/01,18:00
2800,2021/01/02,18:00
112,2021/01/01,9:01
223,2021/01/02,9:02
1145,2021/01/01,9:02
2214,2021/01/02,9:03
11,2021/01/01,9:03
20,2021/01/02,9:10
1145,2021/01/01,19:02
2214,2021/01/02,11:03
11,2021/01/01,19:03
20,2021/01/02,3:10""")
df = pandas.read_csv(csv, index_col=None)
I know the next part is a mess and I hate pandas But I hope you find the answer and got the idea. just run codes :)
grouped = df.groupby('date')
except18_9 = grouped.apply(lambda x: x[(x['time'] != '18:00')&(x['time'] != '9:00')]).reset_index(drop=True)
part1 = except18_9.groupby('date').sample(n=1)
part2 = grouped.apply(lambda x: x.loc[(x['time'] == '18:00') | (x['time'] == '9:00')]).reset_index(drop=True)
pandas.concat([part1,part2]).sort_values(['date','time'])
final result is like this: