Search code examples
pythonhashsetpython-internalsmagic-methods

Set contains for user defined classes using __hash__ function


Given:

class T:
    def __hash__(self):
        return 1234

t1 = T()
t2 = T()

my_set = { t1 }

I would expect the following to print True:

print t2 in my_set

Isn't this supposed to print True because t1 and t2 have the same hash value. How can I make the in operator of the set use the given hash function?


Solution

  • You need to define an __eq__ method because only instances that are identical a is b or equal a == b (besides having the same hash) will be recognized as equal by set and dict:

    class T:
        def __hash__(self):
            return 1234
        def __eq__(self, other):
            return True
    
    t1 = T()
    t2 = T()
    
    my_set = { t1 }
    
    print(t2 in my_set)  # True
    

    The data model on __hash__ (and the same documentation page for Python 2) explains this:

    __hash__

    Called by built-in function hash() and for operations on members of hashed collections including set, frozenset, and dict. __hash__() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to mix together the hash values of the components of the object that also play a part in comparison of objects by packing them into a tuple and hashing the tuple.

    If a class does not define an __eq__() method it should not define a __hash__() operation either; if it defines __eq__() but not __hash__(), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

    User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

    (Emphasis mine)

    Note: In Python 2 you can also implement a __cmp__ method instead of __eq__.