Find the sub-set of unique objects in a list which contains duplicate references to the objects

I have an algorithm that creates a set of lists in the values of a dictionary. However, the number of lists is less than the number of dictionary keys because some of the values are references to the same list object.

When the algorithm is complete, I want to extract a list of only the remaining unique list objects. I want to avoid having to compare two lists since this is inefficient.

I came up with this way to do it using the id function. Maybe this is fine but I wasn't sure if this is an appropriate use of id and wondering if there's a simpler way to do it.

# Start with a full set of unique lists
groups = {i: [i] for i in range(1, 6)}

# Algorithm joins some of the lists together which 
# reduces the total number
for a, b in [(1, 2), (2, 4)]:
    groups[a] += groups[b]
    groups[b] = groups[a]

print(groups)

Output:

{1: [1, 2, 4], 2: [1, 2, 4], 3: [3], 4: [1, 2, 4], 5: [5]}

Note: Now some of the values of the dictionary contain the same list, [1, 2, 4].

# Find the remaining unique lists
all_values = list(groups.values())
ids = [id(x) for x in all_values]
result = [all_values[ids.index(a)] for a in set(ids)]

print(result)

Output:

[[1, 2, 4], [3], [5]]

This is a common question in other languages but I couldn't find one on how to do this in Python.

Solution

What you have is a dictionary where values (lists) may be duplicated.

Sets are useful for reducing duplicate data. However, as lists are not hashable you need to convert them to something else - e.g., a tuple

You could do this:

groups = {i: [i] for i in range(1, 6)}

# Algorithm joins some of the lists together which 
# reduces the total number
for a, b in [(1, 2), (2, 4)]:
    groups[a] += groups[b]
    groups[b] = groups[a]

print(groups)

result = list(map(list, {*(tuple(e) for e in groups.values())}))

print(result)

Output:

[[3], [1, 2, 4], [5]]

A potential disadvantage of this technique is that the output order may not be what's required

Note:

Just to emphasise the id v. hash dilemma, consider this:

a=(1,2,3,4,5)           
b=(1,2,3,4,5)
                   
id(a)              
4429940832

id(b)           
4429952432

hash(a)           
-5659871693760987716

hash(b) 
-5659871693760987716