Search code examples
pythonduplicatesidentitymutable

Remove duplicate mutable objects from a list


When I have a list of immutable objects, lst and want to get rid of duplicates, I can just use set(lst):

lst = [0,4,2,6,3,6,4,9,2,2] # integers are immutable in python
print(set(lst)) # {0,2,3,4,6,9}

However suppose I have a list of mutable objects, lst and want to get rid of duplicates. set(lst) won't work because mutable objects are not hashable - we'd get a TypeError: unhashable type: '<type>'. What should we do in this case?

For example, suppose we have lst, a list of dicts (dicts are mutable and thus not hashable) and some dicts occur multiple times in lst:

d0 = {0:'a', 1:'b', 9:'j'}
d1 = {'jan':1, 'jul':7, 'dec':12}
d2 = {'hello':'hola', 'goodbye':'adios', 'happy':'feliz', 'sad':'triste'}
lst = [d0, d1, d1, d0, d2, d1, d0]

We want to iterate through lst, but only consider each dict once. If we do set(lst), we'd get a TypeError: unhashable type: 'dict'. Instead we have to do something like:

def dedup(lst):
  seen_ids = set()
  for elem in lst:
    id_ = id(elem)
    if id_ not in seen_ids:
      seen_ids.add(id_)
      yield elem

Is there a better way to do this???


Solution

  • Working with object ids is dangerous. It doesn't work anymore if you have two dicts with the same content. You can use the json module to serialize the dict as strings.

    import json
    
    def dedup(lst):
        myset = {json.dumps(x) for x in lst}  # serialize to a set
        return [json.loads(x) for x in myset]  # deserialize to the list