I have a list of lists that need to be merged based on common occurrences of list items. Lists that share elements need to be merged together to form clusters.
I considered a breadth-first traversal to do this, but because of the way the list of lists is arranged, it is difficult to implement the traversal
Example list of lists:
input:
[
[1,2,3],
[2,4,5],
[4,6,8],
[9,10,16],
[16,18,19],
[20,21,22]
]
output: [[1,2,3,4,5,6,8], [9,10,16,18,19], [20,21,22]]
The first three lists need to be merged into a single list (first list and the second list have 2, second and third lists share 4), the fourth and fifth need to be merged because the two share 16. The third is not merged with any other list as it doesn't share any element with the others.
While this can be done in O(n^2) time (n being the number of lists), I am trying to find the most efficient way possible.
You can do this in O(N * log N) where N is total amount of items in all lists.
The idea is simple using Union Find data structure:
Sample code:
def Find(id,P):
if P[id]<0 : return id
P[id]=Find(P[id],P)
return P[id]
def Union(id1, id2, p):
id1 = Find(id1,P)
id2 = Find(id2,P)
if id1 != id2:
P[id2]=id1
input=[
[1,2,3],
[2,4,5],
[4,6,8],
[9,10,16],
[16,18,19],
[20,21,22]
]
P = {}
for list in input :
for item in list :
P[item] = -1
for list in input :
for i in range(1,len(list)):
Union(list[i-1], list[i], P)
ans = {}
for list in input :
for item in list :
if Find(item,P) not in ans:
ans[Find(item,P)] = []
ans[Find(item,P)].append(item)
ans = [set(x) for x in ans.values()]
print(ans)