Search code examples
pythonpandasdataframetimestampcomparison

Extracting Python Pandas records where Timestamp is within specific range


I have a data frame, df, where one of the columns stores processing times (TimeStamp objects).

A sample data frame:

from datetime import datetime, date
import pandas as pd

ids = ['WO_EW-1_10AUR-15-0031_00', 'IW-12_0400-15-0012_00', 'E-8_10AUR-18-0037_00']
dates = [date(2015,9,14), date(2015,9,17), date(2018,8,16)]
datetimes = [datetime(2015,9,14,13,23,40), datetime(2015,9,17,9,6,7), datetime(2018,8,16,7,32,6)]
datalist = list(zip(ids, dates, datetimes))

df = pd.DataFrame(datalist, columns=['ID', 'ProcessDate', 'ProcessingTime'])

enter image description here

What I want to achieve is to extract all records that satisfy a certain condition (or multiple conditions). In one case, I want to locate all records where 'ProcessingTime' attribute has an hour value that is greater than 13:10. In the sample data frame above, the desired output in this case would be the first record.

What is the proper way to apply this type of condition to data frame records?


P.S. I tried using the following, but both did not work:

df.loc[ (df['ProcessTime'].time().hour > 14) ]

This throws a "AttributeError" as 'Series' object has no attribute 'time'

and

df.loc[ (df['ProcessTime'] > datetime.time(14, 0, 0) ]

This throws a "TypeError" as Invalid comparison between dtype=datetime64[ns] and time


Solution

  • import pandas as pd
    from datetime import date, datetime, time
    
    ids = ['WO_EW-1_10AUR-15-0031_00', 'IW-12_0400-15-0012_00', 'E-8_10AUR-18-0037_00']
    dates = [date(2015,9,14), date(2015,9,17), date(2018,8,16)]
    datetimes = [datetime(2015,9,14,13,23,40), datetime(2015,9,17,9,6,7), datetime(2018,8,16,7,32,6)]
    datalist = list(zip(ids, dates, datetimes))
    
    df = pd.DataFrame(datalist, columns=['ID', 'ProcessDate', 'ProcessingTime'])
    
    # display(df)
                             ID ProcessDate      ProcessingTime
    0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40
    1     IW-12_0400-15-0012_00  2015-09-17 2015-09-17 09:06:07
    2      E-8_10AUR-18-0037_00  2018-08-16 2018-08-16 07:32:06
    
    # single condition
    df[df.ProcessingTime.dt.hour > 7]
    
    [out]:
                             ID ProcessDate      ProcessingTime
    0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40
    1     IW-12_0400-15-0012_00  2015-09-17 2015-09-17 09:06:07
    
    # multiple conditions
    df[(df.ProcessingTime.dt.hour > 7) & (df.ProcessingTime.dt.minute > 10)]
    
    [out]:
                             ID ProcessDate      ProcessingTime
    0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40
    
    # an entire datetime
    df[df.ProcessingTime < '2015-09-17 09:06:07']
    
    [out]:
                             ID ProcessDate      ProcessingTime
    0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40
    
    # using .time
    df[df.ProcessingTime.dt.time > time.fromisoformat('07:32:06')]
    
    [out]:
                             ID ProcessDate      ProcessingTime
    0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40
    1     IW-12_0400-15-0012_00  2015-09-17 2015-09-17 09:06:07