python algorithm computer-science graph-algorithm

Finding Overlapping Lists in a List of Lists

I have a list of lists that need to be merged based on common occurrences of list items. Lists that share elements need to be merged together to form clusters.

I considered a breadth-first traversal to do this, but because of the way the list of lists is arranged, it is difficult to implement the traversal

Example list of lists:

input: 
[
 [1,2,3],
 [2,4,5],
 [4,6,8],
 [9,10,16],
 [16,18,19],
 [20,21,22]
]

output: [[1,2,3,4,5,6,8], [9,10,16,18,19], [20,21,22]]

The first three lists need to be merged into a single list (first list and the second list have 2, second and third lists share 4), the fourth and fifth need to be merged because the two share 16. The third is not merged with any other list as it doesn't share any element with the others.

While this can be done in O(n^2) time (n being the number of lists), I am trying to find the most efficient way possible.

Solution

You can do this in O(N * log N) where N is total amount of items in all lists.

The idea is simple using Union Find data structure:

First let's create N disjoint sets for each unique item in input
Merge disjoint sets for all neighboring items for each lists
Collect the result from disjoint sets

Sample code:

def Find(id,P):
    if P[id]<0 : return id
    P[id]=Find(P[id],P)
    return P[id]

def Union(id1, id2, p):
    id1 = Find(id1,P)
    id2 = Find(id2,P)
    if id1 != id2:
        P[id2]=id1

input=[
 [1,2,3],
 [2,4,5],
 [4,6,8],
 [9,10,16],
 [16,18,19],
 [20,21,22]
]

P = {}

for list in input :
    for item in list :
        P[item] = -1

for list in input :
    for i in range(1,len(list)):
            Union(list[i-1], list[i], P)

ans = {}
for list in input :
    for item in list :
        if Find(item,P) not in ans:
            ans[Find(item,P)] = []
        ans[Find(item,P)].append(item)

ans = [set(x) for x in ans.values()]
print(ans)