currently, I do some research with the built-in types of Python. I am confused what methods are invoked to check if a key is in a dictionary. For example, if I check if a key of type int
is in a dictionary, the __eq__()
method is invoked in the background only if dictionary.keys() contains it. If not __eq__()
is not invoked.
Here is an code example:
dict = {
1: "Hello",
2: "World",
4: "Foo"
}
assert 1 in dict.keys() # the __eq__() method of int is invoked
assert not(3 in dict.keys()) # no __eq__() method of int is invoked
I know, that the dictionary holds a tuple of hash(key), key and value. But I am a little bit confused, why __eq__()
is not invoked in the second assert.
To test this behavior I inherited from int
and set some breakpoints. Here is an extract of my custom int
class:
tint(int):
def __new__(cls, value, *args, **kw):
return super(tint, cls).__new__(cls, value)
def __init__(self, value):
super().__init__()
def __eq__(self, other):
return super().__eq__(other) # with breakpoints
def __ne__(self, other):
return super().__ne__(other) # with breakpoints
def __hash__(self):
return tint(super().__hash__()) # with breakpoints
I use Python version 3.6.5 on Ubuntu 18.04.1 LTS.
To find a key in a dict, you first find candidates by hash, then you see whether it is a fluke (the objects are different even though the hash is the same) or really the key you want. So if you can't find anything under that hash, there's nothing to eq with.
Imagine you're looking for an acquaintance at a party, and don't know if she's here yet. She has red hair. Typically, you'd look for red-haired girls, then go face them to see whether it's really her or not. It makes no sense to check each and every person at the party if no red-haired girls arrived yet. (Assuming your friend is not into daily dye-jobs.)
EDIT: CPython stores dicts in an array, where the primary array position is determined by the hash; if that position is occupied, it skips to a next candidate location in a mathematically deterministic way. Since a filled location can consequently hold either the "correct" hash or an unrelated hash, when looking up a hash CPython will start from the primary location, then go on comparing hashes till it decides it's impossible the searched-for key is there. The hashes are at this point plain low-level integers, not Python objects, which explains why hash comparisons don't trigger __eq__
.
Note a cute optimisation in the source: for each candidate, CPython first checks object identity; only then does it check whether hashes are still same, and if so, goes to the slow check to see whether objects are equal (using PyObject_RichCompareBool
, which ultimately calls __eq__
). Why does this matter? Look here:
class Foo:
def __eq__(self, other):
return False # This shouldn't match...
def __hash__(self):
return 7
f = Foo()
d = { f: "Yes!" }
print(d[f]) # ...and yet it does! :)
# => "Yes!"