Search code examples
pythonoopiterator

Implement `__iter__()` and `__next__()` in different


I'm reading a book on Python which illustrates how to implement the iterator protocol.

class Fibbs:
    def __init__(self):
        self.a = 0
        self.b = 1
    def __next__(self):
        self.a, self.b = self.b, self.a + self.b
        return self.a
    def __iter__(self):
        return self

Here, self itself is the iterable and iterator, I believe? However, the para below says:

Note that the iterator implements the __iter__ method, which will, in fact, return the iterator itself. In many cases, you would put the __iter__ method in another object, which you would use in the for loop. That would then return your iterator. It is recommended that iterators implement an __iter__ method of their own in addition (returning self, just as I did here), so they themselves can be used directly in for loops.

Does this mean you can put __iter__() and __next__() in two different objects? Can it be done for objects belonging to different classes? Can it only be done for objects belonging to different classes? It might be a bit bizarre way of implementing the iterator protocol. But I just want to see how, provided it can actually be implemented like that.


Solution

  • How you make iterators and iterables

    There are two ways to do this:

    1. Implement __iter__ to return self and nothing else, implement __next__ on the same class. You've written an iterator.
    2. Implement __iter__ to return some other object that follows the rules of #1 (a cheap way to do this is to write it as a generator function so you don't have to hand-implement the other class). Don't implement __next__. You've written an iterable that is not an iterator.

    For correctly implemented versions of each protocol, the way you tell them apart is the __iter__ method. If the body is just return self (maybe with a logging statement or something, but no other side-effects), then either it's an iterator, or it was written incorrectly. If the body is anything else, then either it's a non-iterator iterable, or it was written incorrectly. Anything else is violating the requirements for the protocols.

    In case #2, the other object would be of another class by definition (because you either have an idempotent __iter__ and implement __next__, or you only have __iter__, without __next__, which produces a new iterator).


    Why the protocol is designed this way

    The reason you need __iter__ even on iterators is to support patterns like:

     iterable = MyIterable(...)
     iterator = iter(iterable)  # Invokes MyIterable.__iter__
     next(iterator, None)  # Throw away first item
     for x in iterator:    # for implicitly calls iterator's __iter__; dies if you don't provide __iter__
    

    The reason you always return a new iterator for iterables, rather than just making them iterators and resetting the state when __iter__ is invoked is to handle the above case (if MyIterable just returned itself and reset iteration, the for loop's implicit call to __iter__ would reset it again and undo the intended skip of the first element) and to support patterns like this:

     for x in iterable:
         for y in iterable:  # Operating over product of all elements in iterable
    

    If __iter__ reset itself to the beginning and only had a single state, this would:

    1. Get the first item and put it in x
    2. Reset, then iterate through the whole of iterable putting each value in y
    3. Try to continue outer loop, discover it's already exhausted, never give any other value to x

    It's also needed because Python assumes that iter(x) is x is a safe, side-effect free way to test if an iterable is an iterator. If your __iter__ modifies your own state, it's not side-effect free. At worst, for iterables, it should waste a little time making an iterator that is immediately thrown away. For iterators, it should be effectively free (since it just returns itself).


    To answer your questions directly:

    Does this mean you can put __iter__() and __next__() in two different objects?

    For iterators, you can't (it must have both methods, though __iter__ is trivial). For non-iterator iterables, you must (it must only have __iter__, and return some other iterator object). There is no "can".

    Can it be done for objects belonging to different classes?

    Yes.

    Can it only be done for objects belonging to different classes?

    Yes.


    Examples

    Example of iterable:

    class MyRange:
        def __init__(self, start, stop):
             self.start = start
             self.stop = stop
    
        def __iter__(self):
             return MyRangeIterator(self)  # Returns new iterator, as this is a non-iterator iterable
    
        # Likely to have other methods (because iterables are often collections of
        # some sort and support many other behaviors)
        # Does *not* have __next__, as this is not an iterator
    

    Example of iterator:

    class MyRangeIterator:  # Class is often non-public and or defined inside the iterable as
                            # nested class; it exists solely to store state for iterator
        def __init__(self, rangeobj):  # Constructed from iterable; could pass raw values if you preferred
            self.current = rangeobj.start
            self.stop = rangeobj.stop
        def __iter__(self):
            return self             # Returns self, because this is an iterator
        def __next__(self):         # Has __next__ because this is an iterator
            retval = self.current   # Must cache current because we need to modify it before we return
            if retval >= self.stop:
                raise StopIteration # Indicates iterator exhausted
            self.current += 1       # Ensure state updated for next call
            return retval           # Return cached value
    
        # Unlikely to have other methods; iterators are generally iterated and that's it
    

    Example of "easy iterable" where you don't implement your own iterator class, by making __iter__ a generator function:

    class MyEasyRange:
        def __init__(self, start, stop): ... # Same as for MyRange
    
        def __iter__(self):  # Generator function is simpler (and faster)
                             # than writing your own iterator class
             current = self.start  # Can't mutate attributes, because multiple iterators might rely on this one iterable
             while current < self.stop:
                 yield current     # Produces value and freezes generator until iteration resumes
                 current += 1
             # reaching the end of the function acts as implicit StopIteration for a generator