I have an algorithm that creates a set of lists in the values of a dictionary. However, the number of lists is less than the number of dictionary keys because some of the values are references to the same list object.
When the algorithm is complete, I want to extract a list of only the remaining unique list objects. I want to avoid having to compare two lists since this is inefficient.
I came up with this way to do it using the id function. Maybe this is fine but I wasn't sure if this is an appropriate use of id and wondering if there's a simpler way to do it.
# Start with a full set of unique lists
groups = {i: [i] for i in range(1, 6)}
# Algorithm joins some of the lists together which
# reduces the total number
for a, b in [(1, 2), (2, 4)]:
groups[a] += groups[b]
groups[b] = groups[a]
print(groups)
Output:
{1: [1, 2, 4], 2: [1, 2, 4], 3: [3], 4: [1, 2, 4], 5: [5]}
Note: Now some of the values of the dictionary contain the same list, [1, 2, 4]
.
# Find the remaining unique lists
all_values = list(groups.values())
ids = [id(x) for x in all_values]
result = [all_values[ids.index(a)] for a in set(ids)]
print(result)
Output:
[[1, 2, 4], [3], [5]]
This is a common question in other languages but I couldn't find one on how to do this in Python.
What you have is a dictionary where values (lists) may be duplicated.
Sets are useful for reducing duplicate data. However, as lists are not hashable you need to convert them to something else - e.g., a tuple
You could do this:
groups = {i: [i] for i in range(1, 6)}
# Algorithm joins some of the lists together which
# reduces the total number
for a, b in [(1, 2), (2, 4)]:
groups[a] += groups[b]
groups[b] = groups[a]
print(groups)
result = list(map(list, {*(tuple(e) for e in groups.values())}))
print(result)
Output:
[[3], [1, 2, 4], [5]]
A potential disadvantage of this technique is that the output order may not be what's required
Note:
Just to emphasise the id v. hash dilemma, consider this:
a=(1,2,3,4,5)
b=(1,2,3,4,5)
id(a)
4429940832
id(b)
4429952432
hash(a)
-5659871693760987716
hash(b)
-5659871693760987716