Search code examples
pandas-groupby

How to filter data by conditions after Groupby in Python


I have data like this:

price Date Time
100 2021/01/01 9:00
200 2021/01/02 9:00
112 2021/01/01 9:01
223 2021/01/02 9:02
1145 2021/01/01 9:02
2214 2021/01/02 9:03
11 2021/01/01 9:03
20 2021/01/02 9:10

I need to get 3 values from each day. The price at 9:00, the price at 18:00 (There are more data), and a random value from that day except 9:00 and 18:00. 9:00 is not the start time, and 18:00 is not the end time.

I know I should use groupby for example: df.groupby('Date')['price'] But I don't know how to use conditions to filter data after groupby.

Because I need to use these data of every day, after I filter these data, I also need to get these data. The expected answer is like [100,112,200] (100 is price at 9:00,112 is the random price, 200 is the price at 18:00)


Solution

  • I add some data to your dataframe:

    import pandas
    from io import StringIO
    
    csv = StringIO("""price,date,time
    100,2021/01/01,9:00
    200,2021/01/02,9:00
    1800,2021/01/01,18:00
    2800,2021/01/02,18:00
    112,2021/01/01,9:01
    223,2021/01/02,9:02
    1145,2021/01/01,9:02
    2214,2021/01/02,9:03
    11,2021/01/01,9:03
    20,2021/01/02,9:10
    1145,2021/01/01,19:02
    2214,2021/01/02,11:03
    11,2021/01/01,19:03
    20,2021/01/02,3:10""")
    
    df = pandas.read_csv(csv, index_col=None)
    

    I know the next part is a mess and I hate pandas But I hope you find the answer and got the idea. just run codes :)

    grouped = df.groupby('date')
    except18_9 = grouped.apply(lambda x: x[(x['time'] != '18:00')&(x['time'] != '9:00')]).reset_index(drop=True)
    part1 = except18_9.groupby('date').sample(n=1)
    part2 = grouped.apply(lambda x: x.loc[(x['time'] == '18:00') | (x['time'] == '9:00')]).reset_index(drop=True)
    pandas.concat([part1,part2]).sort_values(['date','time'])
    

    final result is like this:

    results