When I have a list of immutable objects, lst
and want to get rid of duplicates, I can just use set(lst)
:
lst = [0,4,2,6,3,6,4,9,2,2] # integers are immutable in python
print(set(lst)) # {0,2,3,4,6,9}
However suppose I have a list of mutable objects, lst
and want to get rid of duplicates. set(lst)
won't work because mutable objects are not hashable - we'd get a TypeError: unhashable type: '<type>'
. What should we do in this case?
For example, suppose we have lst
, a list of dict
s (dict
s are mutable and thus not hashable) and some dicts
occur multiple times in lst
:
d0 = {0:'a', 1:'b', 9:'j'}
d1 = {'jan':1, 'jul':7, 'dec':12}
d2 = {'hello':'hola', 'goodbye':'adios', 'happy':'feliz', 'sad':'triste'}
lst = [d0, d1, d1, d0, d2, d1, d0]
We want to iterate through lst
, but only consider each dict
once. If we do set(lst)
, we'd get a TypeError: unhashable type: 'dict'
. Instead we have to do something like:
def dedup(lst):
seen_ids = set()
for elem in lst:
id_ = id(elem)
if id_ not in seen_ids:
seen_ids.add(id_)
yield elem
Is there a better way to do this???
Working with object ids is dangerous.
It doesn't work anymore if you have two dicts with the same content.
You can use the json
module to serialize the dict as strings.
import json
def dedup(lst):
myset = {json.dumps(x) for x in lst} # serialize to a set
return [json.loads(x) for x in myset] # deserialize to the list