Search code examples
pythonpython-3.xgeneratorpython-internals

PEP 424 __length_hint__() - Is there a way to do the same for generators or zips?


Just came across this awesome __length_hint__() method for iterators from PEP 424 (https://www.python.org/dev/peps/pep-0424/). Wow! A way to get the iterator length without exhausting the iterator.

My questions:

  1. Is there a simple explanation how does this magic work? I'm just curious.
  2. Are there limitations and cases where it wouldn't work? ("hint" just sounds a bit suspicious).
  3. Is there a way to get the hint for zips and generators as well? Or is it something fundamental only to iterators?

Edit: BTW, I see that the __length__hint__() counts from current position to the end. i.e. partially consumed iterator will report the remaining length. Interesting.


Solution

  • There are several answers to the question, but they are slightly missing the point: __length_hint__ is not magic. It is a protocol. If an object does not implement the protocol, that's it.


    Let's take a detour and look at a + b, as it is a simple example. The + operator relies on a.__add__ and b.__radd__ to actually do something. int implements __add__ to mean arithmetic addition (1 + 2 == 3), while list implements __add__ to mean content concatenation ([1] + [2] == [1, 2]). This is because __add__ is just a protocol, to which objects must adhere if they provide it. The definition for __add__ is basically just "take another operand and return an object".

    There is no separate, universal meaning to +. If operands do not provide __add__ or _radd__, there is nothing python can do about it.


    Coming back to the actual question(s), what does this imply?

    Is there a simple explanation how does this magic work? I'm just curious.

    All the magic is listed in PEP 424 but it is basically: try len(obj), fall back to obj.__length_hint__, use the default. That is all the magic.

    In practice, an object has to implement __length_hint__ depending what it knows about itself. For example, take the range_iterator of the range backport or the Py3.6 C Code):

    return self._stop - self._current
    

    Here, the iterator know how long it is at most, and how much it has provided. If it wouldn't keep track of the later, it might still return how long it is at most. In either way, it must use internal knowledge about itself.

    Are there limitations and cases where it wouldn't work? ("hint" just sounds a bit suspicious).

    Obviously, objects that don't implement __length_hint__ or __len__ don't work. Fundamentally, any object that does not have enough knowledge about its state cannot implement it.

    Chained generators usually do not implement it. For example, (a ** 2 for a in range(5)) will not forward the length-hint from range. This is sensible if you consider that there may be an arbitrary chain of iterators: length_hint is only an optimization for pre-allocating space, and it may be faster to just fetch the content to put into that space.

    In other cases, it may be plain impossible. Infinite and random iterators fall into this category, but also iterators over external resources.

    Is there a way to get the hint for zips and generators as well? Or is it something fundamental only to iterators?

    If an object does not implement __length_hint__, then no. Zip and generators don't, probably for the efficiency reasons above.

    Also note that a zip and generator objects are their own iterator.

    foo = zip([1,2,3], [1,2,3])
    id(foo) == id(iter(foo))  # returns True in py3.5