Keep rows with true overlaps time period in pandas using specific columns values

I am working with a data frame shown below. Within there are 2 columns that I would like to find out true time overlap rows based on values_1 column. The values_1 column has multiple item that repeats and from that I would only like to extract rows where the start date_time and end date_time (both datetime) overlap for that specific value within values_1 column.

Below is data frame I have:

values_1	uniqId	DeptId	Start DateTime	End DateTime
VGM	1278	BKNG	2023-09-05 18:35:28	2023-09-05 20:05:28
VGM	1259	BKNG	2023-09-05 18:55:18	2023-09-05 19:25:38
VGM	2567	BKNG	2023-09-05 14:29:38	2023-09-05 17:35:28
WEH	9854	ASKG	2023-09-05 13:45:58	2023-09-05 15:00:00
WEH	9124	ASKG	2023-09-05 17:25:28	2023-09-05 19:43:13
WEH	9785	ASKG	2023-09-05 16:15:21	2023-09-05 18:24:02
POH	3479	ASKG	2023-09-05 15:35:29	2023-09-05 17:25:22
POH	3449	ASKG	2023-09-05 17:35:28	2023-09-05 18:35:19

And the output table I would like to achieve is:

values_1	uniqId	DeptId	Start DateTime	End DateTime
VGM	1278	BKNG	2023-09-05 18:35:28	2023-09-05 20:05:28
VGM	1259	BKNG	2023-09-05 18:55:18	2023-09-05 19:25:38
WEH	9124	ASKG	2023-09-05 17:25:28	2023-09-05 19:43:13
WEH	9785	ASKG	2023-09-05 16:15:21	2023-09-05 18:24:02

So from above output table, POH is completely removed because no time overlaps. Along with VGM and WEH 2 rows are dropped because their times did not overlap with among values_1 items.

Can someone please help me with this?

Thanks in advance.

I looked at different overlapping time periods example with no luck that I can apply to my data frame.

Thanks!

Solution

You can sort the times, then use a groupby.shift to compare the successive rows for boolean indexing:

# ensure datetime
df[['Start DateTime', 'End DateTime']] = df[['Start DateTime', 'End DateTime']].apply(pd.to_datetime)

# sort values
tmp = df.sort_values(by=['values_1', 'Start DateTime', 'End DateTime'])

# are consecutive intervals overlapping?
m = tmp['Start DateTime'].lt(tmp.groupby('values_1')['End DateTime'].shift())

# flag rows and previous ones
out = df.loc[m|m.groupby(tmp['values_1']).shift(-1)]

Output:

  values_1  uniqId DeptId      Start DateTime        End DateTime
0      VGM    1278   BKNG 2023-09-05 18:35:28 2023-09-05 20:05:28
1      VGM    1259   BKNG 2023-09-05 18:55:18 2023-09-05 19:25:38
4      WEH    9124   ASKG 2023-09-05 17:25:28 2023-09-05 19:43:13
5      WEH    9785   ASKG 2023-09-05 16:15:21 2023-09-05 18:24:02

Intermediates:

  values_1  uniqId DeptId      Start DateTime        End DateTime           end_shift      m m_shift  m|m_shift
6      POH    3479   ASKG 2023-09-05 15:35:29 2023-09-05 17:25:22                 NaT  False   False      False
7      POH    3449   ASKG 2023-09-05 17:35:28 2023-09-05 18:35:19 2023-09-05 17:25:22  False     NaN      False
2      VGM    2567   BKNG 2023-09-05 14:29:38 2023-09-05 17:35:28                 NaT  False   False      False
0      VGM    1278   BKNG 2023-09-05 18:35:28 2023-09-05 20:05:28 2023-09-05 17:35:28  False    True       True
1      VGM    1259   BKNG 2023-09-05 18:55:18 2023-09-05 19:25:38 2023-09-05 20:05:28   True     NaN       True
3      WEH    9854   ASKG 2023-09-05 13:45:58 2023-09-05 15:00:00                 NaT  False   False      False
5      WEH    9785   ASKG 2023-09-05 16:15:21 2023-09-05 18:24:02 2023-09-05 15:00:00  False    True       True
4      WEH    9124   ASKG 2023-09-05 17:25:28 2023-09-05 19:43:13 2023-09-05 18:24:02   True     NaN       True