I have a data frame, df, where one of the columns stores processing times (TimeStamp objects).
A sample data frame:
from datetime import datetime, date
import pandas as pd
ids = ['WO_EW-1_10AUR-15-0031_00', 'IW-12_0400-15-0012_00', 'E-8_10AUR-18-0037_00']
dates = [date(2015,9,14), date(2015,9,17), date(2018,8,16)]
datetimes = [datetime(2015,9,14,13,23,40), datetime(2015,9,17,9,6,7), datetime(2018,8,16,7,32,6)]
datalist = list(zip(ids, dates, datetimes))
df = pd.DataFrame(datalist, columns=['ID', 'ProcessDate', 'ProcessingTime'])
What I want to achieve is to extract all records that satisfy a certain condition (or multiple conditions). In one case, I want to locate all records where 'ProcessingTime' attribute has an hour value that is greater than 13:10. In the sample data frame above, the desired output in this case would be the first record.
What is the proper way to apply this type of condition to data frame records?
P.S. I tried using the following, but both did not work:
df.loc[ (df['ProcessTime'].time().hour > 14) ]
This throws a "AttributeError" as 'Series' object has no attribute 'time'
and
df.loc[ (df['ProcessTime'] > datetime.time(14, 0, 0) ]
This throws a "TypeError" as Invalid comparison between dtype=datetime64[ns] and time
.dt
accessor.dt.
followed by the desired method (e.g. pandas.Series.dt.time
)import pandas as pd
from datetime import date, datetime, time
ids = ['WO_EW-1_10AUR-15-0031_00', 'IW-12_0400-15-0012_00', 'E-8_10AUR-18-0037_00']
dates = [date(2015,9,14), date(2015,9,17), date(2018,8,16)]
datetimes = [datetime(2015,9,14,13,23,40), datetime(2015,9,17,9,6,7), datetime(2018,8,16,7,32,6)]
datalist = list(zip(ids, dates, datetimes))
df = pd.DataFrame(datalist, columns=['ID', 'ProcessDate', 'ProcessingTime'])
# display(df)
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
1 IW-12_0400-15-0012_00 2015-09-17 2015-09-17 09:06:07
2 E-8_10AUR-18-0037_00 2018-08-16 2018-08-16 07:32:06
# single condition
df[df.ProcessingTime.dt.hour > 7]
[out]:
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
1 IW-12_0400-15-0012_00 2015-09-17 2015-09-17 09:06:07
# multiple conditions
df[(df.ProcessingTime.dt.hour > 7) & (df.ProcessingTime.dt.minute > 10)]
[out]:
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
# an entire datetime
df[df.ProcessingTime < '2015-09-17 09:06:07']
[out]:
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
# using .time
df[df.ProcessingTime.dt.time > time.fromisoformat('07:32:06')]
[out]:
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
1 IW-12_0400-15-0012_00 2015-09-17 2015-09-17 09:06:07