Search code examples
pythonpython-3.xiteratormagic-methodsiterable-unpacking

How is Python's iterator unpacking (star unpacking) implemented (or, what magic methods are involved in unpacking a custom iterator?)


I am writing a class that defines __iter__ and __len__, where the value of __len__ depends on the iterator returned by __iter__. I am getting an interesting RecursionError.

Language versions: Python 3.8.6, 3.7.6. Examples are for illustrating the error only.

In the following example, Iter.__len__() attempts to unpack self, store the result in a list, and then attempts to call the built-in list.__len__() on that list to get the length.

>>> class Iter:
...     def __iter__(self):
...         return range(5).__iter__()
...     def __len__(self):
...         return list.__len__([*self])
...
>>> len(Iter())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in __len__
  File "<stdin>", line 5, in __len__
  File "<stdin>", line 5, in __len__
  [Previous line repeated 993 more times]
  File "<stdin>", line 3, in __iter__
RecursionError: maximum recursion depth exceeded in comparison

However, if I define the class Iter as the following, where Iter.__len__() explicitly unpacks the iterator as returned by Iter.__iter__():

>>> class Iter:
...     def __iter__(self):
...         return range(5).__iter__()
...     def __len__(self):
...         return list.__len__([*self.__iter__()])
...
>>> len(Iter())
5

Then there is no error.

From the traceback, it seems that list.__len__() is trying to call Iter.__len__(), even thought the argument provided is supposedly already a native list object. What is the reason for the RecursionError?


According to schwobaseggl, using set instead of list will not cause a RecursionError:

>>> class Iter:
...     def __iter__(self):
...         return range(5).__iter__()
...     def __len__(self):
...         return set.__len__({*self})
...
>>> len(Iter())
5

Solution

  • It has little to do with unpacking as such, but with the implementations of different collection types, their constructors in particular.

    [*iterable]  # list
    (*iterable,) # tuple
    {*iterable}  # set
    

    all trigger calls to their classes' respective constructors.

    From the current C implementation for list(iterable):

    list___init___impl(PyListObject *self, PyObject *iterable) {
        /* ... */
        if (iterable != NULL) {
            if (_PyObject_HasLen(iterable)) {
                Py_ssize_t iter_len = PyObject_Size(iterable);
                if (iter_len == -1) {
                    if (!PyErr_ExceptionMatches(PyExc_TypeError)) {
                        return -1;
                    }
                    PyErr_Clear();
                }
                if (iter_len > 0 && self->ob_item == NULL
                    && list_preallocate_exact(self, iter_len)) {
                    return -1;
                }
            }
            PyObject *rv = list_extend(self, iterable);
            /* ... */
    }
    

    As can be seen (even with such limited C knowledge as mine), the iterable is tested for its size in order to allocate the right amount of memory which is what triggers the calls to __len__ of the passed iterable.

    Unsurprisingly, it can be verified that set does no such thing. After all, the relation between the size of the passed iterable and the size of the resulting set is nowhere near as direct as it is for lists or tuples. For instance, think of set([1] * 10**5). It would be foolish to use the size information of the passed list to allocate memory for the set.

    On a side note, as has been pointed out in the comments and many other questions/answers on this site (e.g. here):
    If you want to determine the length of an iterable, there are more (mainly space-)efficient ways than to collect all items into a Sized collection, e.g.:

    def __len__(self):
        return sum(1 for _ in self)