Search code examples
pythonpython-datamodel

What is the relationship between the Python data model and built-in functions?


As I read Python answers on Stack Overflow, I continue to see some people telling users to use the data model's special methods or attributes directly.

I then see contradicting advice (sometimes from myself) saying not to do that, and instead to use builtin functions and the operators directly.

Why is that? What is the relationship between the special "dunder" methods and attributes of the Python data model and builtin functions?

When am I supposed to use the special names?


Solution

  • What is the relationship between the Python datamodel and builtin functions?

    • The builtins and operators use the underlying datamodel methods or attributes.
    • The builtins and operators have more elegant behavior and are in general more forward compatible.
    • The special methods of the datamodel are semantically non-public interfaces.
    • The builtins and language operators are specifically intended to be the user interface for behavior implemented by special methods.

    Thus, you should prefer to use the builtin functions and operators where possible over the special methods and attributes of the datamodel.

    The semantically internal APIs are more likely to change than the public interfaces. While Python doesn't actually consider anything "private" and exposes the internals, that doesn't mean it's a good idea to abuse that access. Doing so has the following risks:

    • You may find you have more breaking changes when upgrading your Python executable or switching to other implementations of Python (like PyPy, IronPython, or Jython, or some other unforeseen implementation.)
    • Your colleagues will likely think poorly of your language skills and conscientiousness, and consider it a code-smell, bringing you and the rest of your code to greater scrutiny.
    • The builtin functions are easy to intercept behavior for. Using special methods directly limits the power of your Python for introspection and debugging.

    In depth

    The builtin functions and operators invoke the special methods and use the special attributes in the Python datamodel. They are the readable and maintainable veneer that hides the internals of objects. In general, users should use the builtins and operators given in the language as opposed to calling the special methods or using the special attributes directly.

    The builtin functions and operators also can have fallback or more elegant behavior than the more primitive datamodel special methods. For example:

    • next(obj, default) allows you to provide a default instead of raising StopIteration when an iterator runs out, while obj.__next__() does not.
    • str(obj) fallsback to obj.__repr__() when obj.__str__() isn't available - whereas calling obj.__str__() directly would raise an attribute error.
    • obj != other fallsback to not obj == other in Python 3 when no __ne__ - calling obj.__ne__(other) would not take advantage of this.

    (Builtin functions can also be easily overshadowed, if necessary or desirable, on a module's global scope or the builtins module, to further customize behavior.)

    Mapping the builtins and operators to the datamodel

    Here is a mapping, with notes, of the builtin functions and operators to the respective special methods and attributes that they use or return - note that the usual rule is that the builtin function usually maps to a special method of the same name, but this is not consistent enough to warrant giving this map below:

    builtins/     special methods/
    operators  -> datamodel               NOTES (fb == fallback)
    
    repr(obj)     obj.__repr__()          provides fb behavior for str
    str(obj)      obj.__str__()           fb to __repr__ if no __str__
    bytes(obj)    obj.__bytes__()         Python 3 only
    unicode(obj)  obj.__unicode__()       Python 2 only
    format(obj)   obj.__format__()        format spec optional.
    hash(obj)     obj.__hash__()
    bool(obj)     obj.__bool__()          Python 3, fb to __len__
    bool(obj)     obj.__nonzero__()       Python 2, fb to __len__
    dir(obj)      obj.__dir__()
    vars(obj)     obj.__dict__            does not include __slots__
    type(obj)     obj.__class__           type actually bypasses __class__ -
                                          overriding __class__ will not affect type
    help(obj)     obj.__doc__             help uses more than just __doc__
    len(obj)      obj.__len__()           provides fb behavior for bool
    iter(obj)     obj.__iter__()          fb to __getitem__ w/ indexes from 0 on
    next(obj)     obj.__next__()          Python 3
    next(obj)     obj.next()              Python 2
    reversed(obj) obj.__reversed__()      fb to __len__ and __getitem__
    other in obj  obj.__contains__(other) fb to __iter__ then __getitem__
    obj == other  obj.__eq__(other)
    obj != other  obj.__ne__(other)       fb to not obj.__eq__(other) in Python 3
    obj < other   obj.__lt__(other)       get >, >=, <= with @functools.total_ordering
    complex(obj)  obj.__complex__()
    int(obj)      obj.__int__()
    float(obj)    obj.__float__()
    round(obj)    obj.__round__()
    abs(obj)      obj.__abs__()
    

    The operator module has length_hint which has a fallback implemented by a respective special method if __len__ is not implemented:

    length_hint(obj)  obj.__length_hint__() 
    

    Dotted Lookups

    Dotted lookups are contextual. Without special method implementation, first look in class hierarchy for data descriptors (like properties and slots), then in the instance __dict__ (for instance variables), then in the class hierarchy for non-data descriptors (like methods). Special methods implement the following behaviors:

    obj.attr      obj.__getattr__('attr')       provides fb if dotted lookup fails
    obj.attr      obj.__getattribute__('attr')  preempts dotted lookup
    obj.attr = _  obj.__setattr__('attr', _)    preempts dotted lookup
    del obj.attr  obj.__delattr__('attr')       preempts dotted lookup
    

    Descriptors

    Descriptors are a bit advanced - feel free to skip these entries and come back later - recall the descriptor instance is in the class hierarchy (like methods, slots, and properties). A data descriptor implements either __set__ or __delete__:

    obj.attr        descriptor.__get__(obj, type(obj)) 
    obj.attr = val  descriptor.__set__(obj, val)
    del obj.attr    descriptor.__delete__(obj)
    

    When the class is instantiated (defined) the following descriptor method __set_name__ is called if any descriptor has it to inform the descriptor of its attribute name. (This is new in Python 3.6.) cls is same as type(obj) above, and 'attr' stands in for the attribute name:

    class cls:
        @descriptor_type
        def attr(self): pass # -> descriptor.__set_name__(cls, 'attr') 
    

    Items (subscript notation)

    The subscript notation is also contextual:

    obj[name]         -> obj.__getitem__(name)
    obj[name] = item  -> obj.__setitem__(name, item)
    del obj[name]     -> obj.__delitem__(name)
    

    A special case for subclasses of dict, __missing__ is called if __getitem__ doesn't find the key:

    obj[name]         -> obj.__missing__(name)  
    

    Operators

    There are also special methods for +, -, *, @, /, //, %, divmod(), pow(), **, <<, >>, &, ^, | operators, for example:

    obj + other   ->  obj.__add__(other), fallback to other.__radd__(obj)
    obj | other   ->  obj.__or__(other), fallback to other.__ror__(obj)
    

    and in-place operators for augmented assignment, +=, -=, *=, @=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=, for example:

    obj += other  ->  obj.__iadd__(other)
    obj |= other  ->  obj.__ior__(other)
    

    (If these in-place operators are not defined, Python falls back to, for example, for obj += other to obj = obj + other)

    and unary operations:

    +obj          ->  obj.__pos__()
    -obj          ->  obj.__neg__()
    ~obj          ->  obj.__invert__()
    

    Context Managers

    A context manager defines __enter__, which is called on entering the code block (its return value, usually self, is aliased with as), and __exit__, which is guaranteed to be called on leaving the code block, with exception information.

    with obj as enters_return_value: #->  enters_return_value = obj.__enter__()
        raise Exception('message')
                                     #->  obj.__exit__(Exception, 
                                     #->               Exception('message'), 
                                     #->               traceback_object)
    

    If __exit__ gets an exception and then returns a false value, it will reraise it on leaving the method.

    If no exception, __exit__ gets None for those three arguments instead, and the return value is meaningless:

    with obj:           #->  obj.__enter__()
        pass
                        #->  obj.__exit__(None, None, None)
    

    Some Metaclass Special Methods

    Similarly, classes can have special methods (from their metaclasses) that support abstract base classes:

    isinstance(obj, cls) -> cls.__instancecheck__(obj)
    issubclass(sub, cls) -> cls.__subclasscheck__(sub)
    

    An important takeaway is that while the builtins like next and bool do not change between Python 2 and 3, underlying implementation names are changing.

    Thus using the builtins also offers more forward compatibility.

    When am I supposed to use the special names?

    In Python, names that begin with underscores are semantically non-public names for users. The underscore is the creator's way of saying, "hands-off, don't touch."

    This is not just cultural, but it is also in Python's treatment of API's. When a package's __init__.py uses import * to provide an API from a subpackage, if the subpackage does not provide an __all__, it excludes names that start with underscores. The subpackage's __name__ would also be excluded.

    IDE autocompletion tools are mixed in their consideration of names that start with underscores to be non-public. However, I greatly appreciate not seeing __init__, __new__, __repr__, __str__, __eq__, etc. (nor any of the user created non-public interfaces) when I type the name of an object and a period.

    Thus I assert:

    The special "dunder" methods are not a part of the public interface. Avoid using them directly.

    So when to use them?

    The main use-case is when implementing your own custom object or subclass of a builtin object.

    Try to only use them when absolutely necessary. Here are some examples:

    Use the __name__ special attribute on functions or classes

    When we decorate a function, we typically get a wrapper function in return that hides helpful information about the function. We would use the @wraps(fn) decorator to make sure we don't lose that information, but if we need the name of the function, we need to use the __name__ attribute directly:

    from functools import wraps
    
    def decorate(fn): 
        @wraps(fn)
        def decorated(*args, **kwargs):
            print('calling fn,', fn.__name__) # exception to the rule
            return fn(*args, **kwargs)
        return decorated
    

    Similarly, I do the following when I need the name of the object's class in a method (used in, for example, a __repr__):

    def get_class_name(self):
        return type(self).__name__
              # ^          # ^- must use __name__, no builtin e.g. name()
              # use type, not .__class__
    

    Using special attributes to write custom classes or subclassed builtins

    When we want to define custom behavior, we must use the data-model names.

    This makes sense, since we are the implementors, these attributes aren't private to us.

    class Foo(object):
        # required to here to implement == for instances:
        def __eq__(self, other):      
            # but we still use == for the values:
            return self.value == other.value
        # required to here to implement != for instances:
        def __ne__(self, other): # docs recommend for Python 2.
            # use the higher level of abstraction here:
            return not self == other  
    

    However, even in this case, we don't use self.value.__eq__(other.value) or not self.__eq__(other) (see my answer here for proof that the latter can lead to unexpected behavior.) Instead, we should use the higher level of abstraction.

    Another point at which we'd need to use the special method names is when we are in a child's implementation, and want to delegate to the parent. For example:

    class NoisyFoo(Foo):
        def __eq__(self, other):
            print('checking for equality')
            # required here to call the parent's method
            return super(NoisyFoo, self).__eq__(other) 
    

    Conclusion

    The special methods allow users to implement the interface for object internals.

    Use the builtin functions and operators wherever you can. Only use the special methods where there is no documented public API.