Search code examples
pythoncollectionsmembershipsetcmp

Understanding python object membership for sets


If I understand correctly, the __cmp__() function of an object is called in order to evaluate all objects in a collection while determining whether an object is a member, or 'in', the collection. However, this does not seem to be the case for sets:

class MyObject(object):
    def __init__(self, data):
        self.data = data

    def __cmp__(self, other):
        return self.data-other.data

a = MyObject(5)
b = MyObject(5)

print a in [b]          //evaluates to True, as I'd expect
print a in set([b])     //evaluates to False

How is an object membership tested in a set, then?


Solution

  • >>> xs = []
    >>> set([xs])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'list'
    

    There you are. Sets use hashes, very similar to dicts. This help performance extremely (membership tests are O(1), and many other operations depend on membership tests), and it also fits the semantics of sets well: Set items must be unique, and different items will produce different hashes, while same hashes indicate (well, in theory) duplicates.

    Since the default __hash__ is just id (which is rather stupid imho), two instances of a class that inherits object's __hash__ will never hash to the same value (well, unless adress space is larger than the sizeof the hash).