Only extract rows with Date Time overlaps

I am working with a data frame shown below. Within there are 2 columns that I would like to find out true date and time overlap rows based on values_1 column. The values_1 column has multiple item that repeats and from that I would only like to extract rows where the start date_time and end date_time (both datetime) overlap for that specific value within values_1 column.

Below is data frame I have:

values_1	uniqId	DeptId	Start DateTime	End DateTime
VGM	1278	BKNG	2023-09-07 18:35:28	2023-09-07 20:05:28
VGM	1259	BKNG	2023-09-07 18:55:18	2023-09-07 19:25:38
VGM	2567	BKNG	2023-09-07 14:29:38	2023-09-07 17:35:28
WEH	9854	ASKG	2023-09-05 13:45:58	2023-09-05 15:00:00
WEH	9124	ASKG	2023-09-05 17:25:28	2023-09-05 19:43:13
WEH	9785	ASKG	2023-09-05 16:15:21	2023-09-05 18:24:02
POH	3479	ASKG	2023-09-06 15:35:29	2023-09-06 17:25:22
POH	3449	ASKG	2023-09-06 15:45:28	2023-09-06 18:35:19
PNM	4579	FHDU	2023-09-04 14:15:29	2023-09-04 18:25:22
PNM	4987	FHDU	2023-09-04 14:45:28	2023-09-06 18:35:19

And the output table I would like to achieve is:

values_1	uniqId	DeptId	Start DateTime	End DateTime
VGM	1278	BKNG	2023-09-07 18:35:28	2023-09-07 20:05:28
VGM	1259	BKNG	2023-09-07 18:55:18	2023-09-07 19:25:38
WEH	9124	ASKG	2023-09-05 17:25:28	2023-09-05 19:43:13
WEH	9785	ASKG	2023-09-05 16:15:21	2023-09-05 18:24:02
POH	3479	ASKG	2023-09-06 15:35:29	2023-09-06 17:25:22
POH	3449	ASKG	2023-09-06 15:45:28	2023-09-06 18:35:19
PNM	4579	FHDU	2023-09-04 14:15:29	2023-09-04 18:25:22
PNM	4987	FHDU	2023-09-04 14:45:28	2023-09-06 18:35:19

So from above output table, POH is completely removed because NO time overlaps. Along with VGM and WEH 2 rows are dropped because their times did not overlap with among values_1 items. POH and PNM, both showed overlap and hence were part of the output.

Can someone please help me with this?

Thanks in advance.

I would like to extract data with overlapping time periods that has non-consecutive order dates. I did not find any example that I can apply to my data frame.

Thanks!

Solution

You can try:

from itertools import combinations


def find_overlaps(g):
    idx = pd.IntervalIndex.from_arrays(
        g["Start DateTime"], g["End DateTime"], closed="both"
    )

    to_keep = set()
    for (i1, interval1), (i2, interval2) in combinations(zip(g.index, idx.values), 2):
        if interval1.overlaps(interval2):
            to_keep.add(i1)
            to_keep.add(i2)

    return g.loc[list(to_keep)]


df["Start DateTime"] = pd.to_datetime(df["Start DateTime"])
df["End DateTime"] = pd.to_datetime(df["End DateTime"])

out = df.groupby("values_1", group_keys=False).apply(find_overlaps)
print(out)

Prints:

  values_1  uniqId DeptId      Start DateTime        End DateTime
8      PNM    4579   FHDU 2023-09-04 14:15:29 2023-09-04 18:25:22
9      PNM    4987   FHDU 2023-09-04 14:45:28 2023-09-06 18:35:19
6      POH    3479   ASKG 2023-09-06 15:35:29 2023-09-06 17:25:22
7      POH    3449   ASKG 2023-09-06 15:45:28 2023-09-06 18:35:19
0      VGM    1278   BKNG 2023-09-07 18:35:28 2023-09-07 20:05:28
1      VGM    1259   BKNG 2023-09-07 18:55:18 2023-09-07 19:25:38
4      WEH    9124   ASKG 2023-09-05 17:25:28 2023-09-05 19:43:13
5      WEH    9785   ASKG 2023-09-05 16:15:21 2023-09-05 18:24:02