I am writing a class that defines __iter__
and __len__
, where the value of __len__
depends on the iterator returned by __iter__
. I am getting an interesting RecursionError
.
Language versions: Python 3.8.6, 3.7.6. Examples are for illustrating the error only.
In the following example, Iter.__len__()
attempts to unpack self
, store the result in a list
, and then attempts to call the built-in list.__len__()
on that list to get the length.
>>> class Iter:
... def __iter__(self):
... return range(5).__iter__()
... def __len__(self):
... return list.__len__([*self])
...
>>> len(Iter())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in __len__
File "<stdin>", line 5, in __len__
File "<stdin>", line 5, in __len__
[Previous line repeated 993 more times]
File "<stdin>", line 3, in __iter__
RecursionError: maximum recursion depth exceeded in comparison
However, if I define the class Iter
as the following, where Iter.__len__()
explicitly unpacks the iterator as returned by Iter.__iter__()
:
>>> class Iter:
... def __iter__(self):
... return range(5).__iter__()
... def __len__(self):
... return list.__len__([*self.__iter__()])
...
>>> len(Iter())
5
Then there is no error.
From the traceback, it seems that list.__len__()
is trying to call Iter.__len__()
, even thought the argument provided is supposedly already a native list
object. What is the reason for the RecursionError
?
According to schwobaseggl, using set
instead of list
will not cause a RecursionError
:
>>> class Iter:
... def __iter__(self):
... return range(5).__iter__()
... def __len__(self):
... return set.__len__({*self})
...
>>> len(Iter())
5
It has little to do with unpacking as such, but with the implementations of different collection types, their constructors in particular.
[*iterable] # list
(*iterable,) # tuple
{*iterable} # set
all trigger calls to their classes' respective constructors.
From the current C implementation for list(iterable)
:
list___init___impl(PyListObject *self, PyObject *iterable) {
/* ... */
if (iterable != NULL) {
if (_PyObject_HasLen(iterable)) {
Py_ssize_t iter_len = PyObject_Size(iterable);
if (iter_len == -1) {
if (!PyErr_ExceptionMatches(PyExc_TypeError)) {
return -1;
}
PyErr_Clear();
}
if (iter_len > 0 && self->ob_item == NULL
&& list_preallocate_exact(self, iter_len)) {
return -1;
}
}
PyObject *rv = list_extend(self, iterable);
/* ... */
}
As can be seen (even with such limited C knowledge as mine), the iterable is tested for its size in order to allocate the right amount of memory which is what triggers the calls to __len__
of the passed iterable.
Unsurprisingly, it can be verified that set
does no such thing. After all, the relation between the size of the passed iterable and the size of the resulting set is nowhere near as direct as it is for lists or tuples. For instance, think of set([1] * 10**5)
. It would be foolish to use the size information of the passed list to allocate memory for the set.
On a side note, as has been pointed out in the comments and many other questions/answers on this site (e.g. here):
If you want to determine the length of an iterable
, there are more (mainly space-)efficient ways than to collect all items into a Sized
collection, e.g.:
def __len__(self):
return sum(1 for _ in self)