I am working with a data frame shown below. Within there are 2 columns that I would like to find out true time overlap rows based on values_1 column. The values_1 column has multiple item that repeats and from that I would only like to extract rows where the start date_time and end date_time (both datetime) overlap for that specific value within values_1 column.
Below is data frame I have:
values_1 | uniqId | DeptId | Start DateTime | End DateTime |
---|---|---|---|---|
VGM | 1278 | BKNG | 2023-09-05 18:35:28 | 2023-09-05 20:05:28 |
VGM | 1259 | BKNG | 2023-09-05 18:55:18 | 2023-09-05 19:25:38 |
VGM | 2567 | BKNG | 2023-09-05 14:29:38 | 2023-09-05 17:35:28 |
WEH | 9854 | ASKG | 2023-09-05 13:45:58 | 2023-09-05 15:00:00 |
WEH | 9124 | ASKG | 2023-09-05 17:25:28 | 2023-09-05 19:43:13 |
WEH | 9785 | ASKG | 2023-09-05 16:15:21 | 2023-09-05 18:24:02 |
POH | 3479 | ASKG | 2023-09-05 15:35:29 | 2023-09-05 17:25:22 |
POH | 3449 | ASKG | 2023-09-05 17:35:28 | 2023-09-05 18:35:19 |
And the output table I would like to achieve is:
values_1 | uniqId | DeptId | Start DateTime | End DateTime |
---|---|---|---|---|
VGM | 1278 | BKNG | 2023-09-05 18:35:28 | 2023-09-05 20:05:28 |
VGM | 1259 | BKNG | 2023-09-05 18:55:18 | 2023-09-05 19:25:38 |
WEH | 9124 | ASKG | 2023-09-05 17:25:28 | 2023-09-05 19:43:13 |
WEH | 9785 | ASKG | 2023-09-05 16:15:21 | 2023-09-05 18:24:02 |
So from above output table, POH is completely removed because no time overlaps. Along with VGM and WEH 2 rows are dropped because their times did not overlap with among values_1 items.
Can someone please help me with this?
Thanks in advance.
I looked at different overlapping time periods example with no luck that I can apply to my data frame.
Thanks!
You can sort the times, then use a groupby.shift
to compare the successive rows for boolean indexing:
# ensure datetime
df[['Start DateTime', 'End DateTime']] = df[['Start DateTime', 'End DateTime']].apply(pd.to_datetime)
# sort values
tmp = df.sort_values(by=['values_1', 'Start DateTime', 'End DateTime'])
# are consecutive intervals overlapping?
m = tmp['Start DateTime'].lt(tmp.groupby('values_1')['End DateTime'].shift())
# flag rows and previous ones
out = df.loc[m|m.groupby(tmp['values_1']).shift(-1)]
Output:
values_1 uniqId DeptId Start DateTime End DateTime
0 VGM 1278 BKNG 2023-09-05 18:35:28 2023-09-05 20:05:28
1 VGM 1259 BKNG 2023-09-05 18:55:18 2023-09-05 19:25:38
4 WEH 9124 ASKG 2023-09-05 17:25:28 2023-09-05 19:43:13
5 WEH 9785 ASKG 2023-09-05 16:15:21 2023-09-05 18:24:02
Intermediates:
values_1 uniqId DeptId Start DateTime End DateTime end_shift m m_shift m|m_shift
6 POH 3479 ASKG 2023-09-05 15:35:29 2023-09-05 17:25:22 NaT False False False
7 POH 3449 ASKG 2023-09-05 17:35:28 2023-09-05 18:35:19 2023-09-05 17:25:22 False NaN False
2 VGM 2567 BKNG 2023-09-05 14:29:38 2023-09-05 17:35:28 NaT False False False
0 VGM 1278 BKNG 2023-09-05 18:35:28 2023-09-05 20:05:28 2023-09-05 17:35:28 False True True
1 VGM 1259 BKNG 2023-09-05 18:55:18 2023-09-05 19:25:38 2023-09-05 20:05:28 True NaN True
3 WEH 9854 ASKG 2023-09-05 13:45:58 2023-09-05 15:00:00 NaT False False False
5 WEH 9785 ASKG 2023-09-05 16:15:21 2023-09-05 18:24:02 2023-09-05 15:00:00 False True True
4 WEH 9124 ASKG 2023-09-05 17:25:28 2023-09-05 19:43:13 2023-09-05 18:24:02 True NaN True