I have a DataFrame of events (Event Name - Time) and a DataFrame of time windows (Start Time - End Time). I want to get a DataFrame containing only the events not in any of the time windows. I am looking for a "pythonic" way to filter the DataFrame.
Example: Events DataFrame:
Event Name | Event Time |
---|---|
Event1 | 02/01/2000 00:00:00 |
Event2 | 05/01/2000 10:00:00 |
Event3 | 07/01/2000 09:00:00 |
Event4 | 10/01/2000 02:00:00 |
Time Windows DataFrame:
Time Window Name | Start Time | End Time |
---|---|---|
Window1 | 01/01/2000 00:00:00 | 06/01/2000 00:00:00 |
Window2 | 10/01/2000 01:00:00 | 10/01/2000 04:00:00 |
Result: Filtered Events DataFrame:
Event Name | Event Time |
---|---|
Event3 | 07/01/2000 09:00:00 |
Setup:
import pandas as pd
events_data = {
'Event Name': ['Event1', 'Event2', 'Event3', 'Event4'],
'Event Time': ['02/01/2000 00:00:00', '05/01/2000 10:00:00', '07/01/2000 09:00:00', '10/01/2000 02:00:00']
}
time_windows_data = {
'Time Window Name': ['Window1', 'Window2'],
'Start Time': ['01/01/2000 00:00:00', '10/01/2000 01:00:00'],
'End Time': ['06/01/2000 00:00:00', '10/01/2000 04:00:00']
}
events_df = pd.DataFrame(events_data)
time_windows_df = pd.DataFrame(time_windows_data)
events_df['Event Time'] = pd.to_datetime(events_df['Event Time'], format='%d/%m/%Y %H:%M:%S')
time_windows_df['Start Time'] = pd.to_datetime(time_windows_df['Start Time'], format='%d/%m/%Y %H:%M:%S')
time_windows_df['End Time'] = pd.to_datetime(time_windows_df['End Time'], format='%d/%m/%Y %H:%M:%S')
You can build an IntervalIndex
then create a boolean mask with reindex
:
# build IntervalIndex
idx = pd.IntervalIndex.from_arrays(df_time['Start Time'], df_time['End Time'])
# build boolean mask
m = (pd.Series(False, index=idx)
.reindex(df_events['Event Time'],fill_value=True)
.to_numpy()
)
# select non-matching rows
out = df_events[m]
Alternative to build m
:
m = idx.reindex(df_events['Event Time'])[1] == -1
Output:
Event Name Event Time
2 Event3 2000-01-07 09:00:00
Intermediates:
# idx
IntervalIndex([(2000-01-01 00:00:00, 2000-01-06 00:00:00],
(2000-01-10 01:00:00, 2000-01-10 04:00:00]],
dtype='interval[datetime64[ns], right]')
# m
array([False, False, True, False])
Reproducible inputs:
import pandas as pd
from pandas import Timestamp
df_events = pd.DataFrame({'Event Name': ['Event1', 'Event2', 'Event3', 'Event4'],
'Event Time': [Timestamp('2000-01-02 00:00:00'),
Timestamp('2000-01-05 10:00:00'),
Timestamp('2000-01-07 09:00:00'),
Timestamp('2000-01-10 02:00:00')]})
df_time = pd.DataFrame({'Time Window Name': ['Window1', 'Window2'],
'Start Time': [Timestamp('2000-01-01 00:00:00'), Timestamp('2000-01-10 01:00:00')],
'End Time': [Timestamp('2000-01-06 00:00:00'), Timestamp('2000-01-10 04:00:00')]})