Find longest interval between overlapping intervals in Python?

I have a list of intervals in the form of tuples such as this:

data = [(5,10), (3,10), (13,15), (12,18), (20,29), (25,30)]

Each item in the tuple has 2 values (i.e., start, end) and there may or may not be overlaps between different intervals. If there are overlapping intervals, I want to only take the longest ones. The output in this test would be the following:

output = [(3,10), (12,18), (20,29)]

How can I do this in Python using either the standard library, numpy, or pandas?

I started doing something naive like this but I don't think this will scale well...I also would rather not use NetworkX

import networkx as nx
data = [(5,10), (3,10), (13,15), (12,18), (20,29), (25,30)]

graph = nx.Graph()

n = len(data)
for i, a in enumerate(data):
    a_seq = set(range(a[0], a[1] + 1))
   
    for j in range(i+1, n):
        b = data[j]
        b_seq = set(range(b[0], b[1] + 1))

        n_overlap = len(a_seq & b_seq)
        if n_overlap:
            graph.add_edge(a, b, weight=n_overlap)

output = list()
for nodes in nx.connected_components(graph):
    lengths = dict()
    for node in nodes:
        start, end = node
        lengths[node] = end - start
    longest_interval, length_of_interval = sorted(lengths.items(), key=lambda x:x[1], reverse=True)[0]
    output.append(longest_interval)

I'm assuming there is a better approach but right now it's escaping me.

Edit: There might be some confusion in the task but I can't mix and match intervals (e.g., (20,30) is invalid because it's not one of the starting intervals).

Solution

Your post mentioned not using any libraries so that's what I did. Also, you mentioned many clarifications in the comments so this code handles all the edge cases.

def find_max_length_in_overlap_groups(intervals):
    #get sorted list of intervals by starting point
    aux = sorted(intervals, key=lambda x: x[0])
    new_group = [aux[0]]
    output = []
    for interval in aux[1:]:
        if interval[0] > new_group[-1][-1]:
        #if the interval does not overlap with the last interval we start a new group
            output.append(
                #find interval of maximum length
                max(new_group, key=lambda interval: interval[1] - interval[0])
            )
            #start new group
            new_group = [interval]
        else:
            #otherwise just append it onto the same group if it overlaps
            new_group.append(interval)
    #take care of the last group, doing the same as above
    output.append(max(new_group, key=lambda interval: interval[1] - interval[0]))
    return output

This approach is based on a solution for a similar problem found here, with a more complete explanantion. This approach doesn't preserve the original ordering of the output. If you want that you have to modify the function slightly, by storing the original indices and then sorting by them:

def find_max_length_in_overlap_groups(intervals):
    #store the indices
    indices={val:ind for ind, val in enumerate(intervals)}
    aux = sorted(intervals, key=lambda x: x[0])
    new_group = [aux[0]]
    output = []
    for interval in aux[1:]:
        if interval[0] > new_group[-1][-1]:
            output.append(
                max(new_group, key=lambda interval: interval[1] - interval[0])
            )
            new_group = [interval]
        else:
            new_group.append(interval)
    output.append(max(new_group, key=lambda interval: interval[1] - interval[0]))
    #sort the output according to indices in input to preserve order
    return sorted(output, key = lambda x: indices[x])

Both algorithms have the same time complexity: O(n log n) due to the sorting