Search code examples
pythonfunctionmethodsmethod-call

Why do we bypass instance attributes during implicit lookup of special methods?


From the ‘Special method lookup for new-style classes’ section of the ‘Data model’ chapter in the Python documentation (bold emphasis mine):

For new-style classes, implicit invocations of special methods are only guaranteed to work correctly if defined on an object’s type, not in the object’s instance dictionary. That behaviour is the reason why the following code raises an exception (unlike the equivalent example with old-style classes):

>>> class C(object):
...     pass
...
>>> c = C()
>>> c.__len__ = lambda: 5
>>> len(c)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'C' has no len()

The rationale behind this behaviour lies with a number of special methods such as __hash__() and __repr__() that are implemented by all objects, including type objects. If the implicit lookup of these methods used the conventional lookup process, they would fail when invoked on the type object itself:

>>> 1 .__hash__() == hash(1)
True
>>> int.__hash__() == hash(int)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor ’__hash__’ of ’int’ object needs an argument

Incorrectly attempting to invoke an unbound method of a class in this way is sometimes referred to as ‘metaclass confusion’, and is avoided by bypassing the instance when looking up special methods:

>>> type(1).__hash__(1) == hash(1)
True
>>> type(int).__hash__(int) == hash(int)
True

I cannot catch the words in bold well…


Solution

  • To understand what's going on here, you need to have a (basic) understanding of the conventional attribute lookup process. Take a typical introductory object-oriented programming example - fido is a Dog:

    class Dog(object):
        pass
    
    fido = Dog()
    

    If we say fido.walk(), the first thing Python does is to look for a function called walk in fido (as an entry in fido.__dict__) and call it with no arguments - so, one that's been defined something like this:

    def walk():
       print "Yay! Walking! My favourite thing!"
    
    fido.walk = walk
    

    and fido.walk() will work. If we hadn't done that, it would look for an attribute walk in type(fido) (which is Dog) and call it with the instance as the first argument (ie, self) - that is triggered by the usual way we define methods in Python:

    class Dog:
        def walk(self):
             print "Yay! Walking! My favourite thing!"
    

    Now, when you call repr(fido), it ends up calling the special method __repr__. It might be (poorly, but illustratively) defined like this:

    class Dog:
        def __repr__(self):
              return 'Dog()'
    

    But, the bold text is saying that it also makes sense to do this:

     repr(Dog)
    

    Under the lookup process I just described, the first thing it looks for is a method called __repr__ assigned to Dog... and hey, look, there is one, because we just poorly but illustratively defined it. So, Python calls:

    Dog.__repr__()
    

    And it blows up in our face:

    >>> Dog.__repr__()
    Traceback (most recent call last):
      File "<pyshell#38>", line 1, in <module>
        Dog.__repr__()
    TypeError: __repr__() takes exactly 1 argument (0 given)
    

    because __repr__() expects a Dog instance to be passed to it as its self argument. We could do this to make it work:

    class Dog:
        def __repr__(self=None):
           if self is None:
               # return repr of Dog
           # return repr of self
    

    But, then, we would need to do this every time we write a custom __repr__ function. That it needs to know how to find the __repr__ of the class is a problem, but not much of a one - it can just delegate to Dog's own class (type(Dog)) and call its __repr__ with Dog as its self-argument:

     if self is None:
       return type(Dog).__repr__(Dog)
    

    But first, this breaks if the classname changes in the future, since we've needed to mention it twice in the same line. But the bigger problem is that this is basically going to be boilerplate: 99% of implementations will just delegate up the chain, or forget to and hence be buggy. So, Python takes the approach described in those paragraphs - repr(foo) skips finding an __repr__ attached to foo, and goes straight to:

    type(foo).__repr__(foo)