Just came across this awesome __length_hint__()
method for iterators from PEP 424 (https://www.python.org/dev/peps/pep-0424/). Wow! A way to get the iterator length without exhausting the iterator.
My questions:
Edit: BTW, I see that the __length__hint__()
counts from current position to the end. i.e. partially consumed iterator will report the remaining length. Interesting.
There are several answers to the question, but they are slightly missing the point: __length_hint__
is not magic. It is a protocol. If an object does not implement the protocol, that's it.
Let's take a detour and look at a + b
, as it is a simple example. The +
operator relies on a.__add__
and b.__radd__
to actually do something. int
implements __add__
to mean arithmetic addition (1 + 2 == 3
), while list
implements __add__
to mean content concatenation ([1] + [2] == [1, 2]
). This is because __add__
is just a protocol, to which objects must adhere if they provide it. The definition for __add__
is basically just "take another operand and return an object".
There is no separate, universal meaning to +
. If operands do not provide __add__
or _radd__
, there is nothing python can do about it.
Coming back to the actual question(s), what does this imply?
Is there a simple explanation how does this magic work? I'm just curious.
All the magic is listed in PEP 424 but it is basically: try len(obj)
, fall back to obj.__length_hint__
, use the default. That is all the magic.
In practice, an object has to implement __length_hint__
depending what it knows about itself. For example, take the range_iterator
of the range backport or the Py3.6 C Code):
return self._stop - self._current
Here, the iterator know how long it is at most, and how much it has provided. If it wouldn't keep track of the later, it might still return how long it is at most. In either way, it must use internal knowledge about itself.
Are there limitations and cases where it wouldn't work? ("hint" just sounds a bit suspicious).
Obviously, objects that don't implement __length_hint__
or __len__
don't work. Fundamentally, any object that does not have enough knowledge about its state cannot implement it.
Chained generators usually do not implement it. For example, (a ** 2 for a in range(5))
will not forward the length-hint from range
. This is sensible if you consider that there may be an arbitrary chain of iterators: length_hint
is only an optimization for pre-allocating space, and it may be faster to just fetch the content to put into that space.
In other cases, it may be plain impossible. Infinite and random iterators fall into this category, but also iterators over external resources.
Is there a way to get the hint for zips and generators as well? Or is it something fundamental only to iterators?
If an object does not implement __length_hint__
, then no. Zip and generators don't, probably for the efficiency reasons above.
Also note that a zip and generator objects are their own iterator.
foo = zip([1,2,3], [1,2,3])
id(foo) == id(iter(foo)) # returns True in py3.5