Search code examples
pythonpython-3.xhashsetpython-datamodel

Python - class __hash__ method and set


I'm using set() and __hash__ method of python class to prevent adding same hash object in set. According to python data-model document, set() consider same hash object as same object and just add them once.

But it behaves different as below:

class MyClass(object):

    def __hash__(self):
        return 0

result = set()
result.add(MyClass())
result.add(MyClass())

print(len(result)) # len = 2

While in case of string value, it works correctly.

result.add('aida')
result.add('aida')

print(len(result)) # len = 1

My question is: why the same hash objects are not same in set?


Solution

  • Your reading is incorrect. The __eq__ method is used for equality checks. The documents just state that the __hash__ value must also be the same for 2 objects a and b for which a == b (i.e. a.__eq__(b)) is true.

    This is a common logic mistake: a == b being true implies that hash(a) == hash(b) is also true. However, an implication does not necessarily mean equivalence, that in addition to the prior, hash(a) == hash(b) would mean that a == b.

    To make all instances of MyClass compare equal to each other, you need to provide an __eq__ method for them; otherwise Python will compare their identities instead. This might do:

    class MyClass(object):
        def __hash__(self):
            return 0
        def __eq__(self, other):
            # another object is equal to self, iff 
            # it is an instance of MyClass
            return isinstance(other, MyClass)
    

    Now:

    >>> result = set()
    >>> result.add(MyClass())
    >>> result.add(MyClass())
    1
    

    In reality you'd base the __hash__ on those properties of your object that are used for __eq__ comparison, for example:

    class Person
        def __init__(self, name, ssn):
            self.name = name
            self.ssn = ssn
    
        def __eq__(self, other):
            return isinstance(other, Person) and self.ssn == other.ssn
    
        def __hash__(self):
            # use the hashcode of self.ssn since that is used
            # for equality checks as well
            return hash(self.ssn)
    
    p = Person('Foo Bar', 123456789)
    q = Person('Fake Name', 123456789)
    print(len({p, q})  # 1