Search code examples
pythonpandasgroup-by

Pandas - groupby string field and select by time-of-day range


I have a dataset like this

index   Date_Time   Pass_ID El
0   3/30/23 05:12:36.36 A   1
1   3/30/23 05:12:38.38 A   2
1   3/30/23 05:12:40.40 A   3
1   3/30/23 05:12:42.42 A   4
1   3/30/23 05:12:44.44 A   4
1   3/30/23 12:12:50.50 B   3
1   3/30/23 12:12:52.52 B   4
1   3/30/23 12:12:54.54 B   5
1   3/30/23 12:12:56.56 B   6
1   3/30/23 12:12:58.58 B   7
1   3/30/23 12:13:00.00 B   8
1   3/30/23 12:13:02.02 B   9
1   3/31/23 20:02:02.02 C   3
1   3/31/23 20:02:05.05 C   4

The Date_Time is pandas datetime object.

I'd like to group the records by Pass_ID, and then select out only those unique Pass_IDs that occur between specific hours in the day: for instance, between 10:00 and 13:00 would return B.

I don't know how to get groupby and 'between_time' to work in this case... which would seem to be the best way forward. I've also tried using a lambda function after setting the Date_Time as the index, but that didn't work. And using aggregate doesn't seem to allow me to pull out the dt.hour of the Date_Time field. Anyone know how to do this concisely?


Solution

  • Try:

    # to datetime if necessary
    # df["Date_Time"] = pd.to_datetime(df["Date_Time"])
    
    out = df.set_index("Date_Time").between_time("10:00", "13:00")["Pass_ID"].unique()
    print(out)
    

    Prints:

    ['B']
    

    OR: If you want to filter whole groups between time 10:00-13:00:

    out = (
        df.groupby("Pass_ID")
        .filter(
            lambda x: len(x.set_index("Date_Time").between_time("10:00", "13:00")) == len(x)
        )["Pass_ID"]
        .unique()
    )
    print(out)