Search code examples
pythoniterator

Why are some (but not all) Python iterators summable after being exhausted?


In Python, iterators are intended for one-time use. Once an iterator has raised StopIteration, it shouldn't return any more values. Yet if I define a custom iterator, it seems that I can still sum the values after they're exhausted!

Example code (Python 3.6.5, or replace __next__(self) with next(self) to see the same behaviour in Python 2.7.15):

class CustomIterator:
  def __iter__(self):
    self.n=0
    return self

  def __next__(self):
    self.n += 1
    if self.n > 3:
      raise StopIteration
    return self.n

i1 = iter([1,2,3])
i2 = iter(CustomIterator())

print('Sum of i1 is {}'.format(sum(i1))) # returns 6 as expected
print('Sum of i1 is {}'.format(sum(i1))) # returns 0 because i1 is now exhausted
try:
  print(next(i1))
except StopIteration:
  print("i1 has raised StopIteration") # this exception happens
print('Sum of i1 is {}'.format(sum(i1))) # 0 again

print('Sum of i2 is {}'.format(sum(i2))) # returns 6 as expected
print('Sum of i2 is {}'.format(sum(i2))) # returns 6 again!
try:
  print(next(i2))
except StopIteration:
  print("i2 has raised StopIteration") # still get an exception
print('Sum of i2 is {}'.format(sum(i2))) # and yet we get 6 again

Why do i1 and i2 behave differently? Is it some trick in how sum is implemented? I've checked https://docs.python.org/3/library/functions.html#sum and it doesn't give me a lot to go on.

Related questions:

These describe the expected behaviour for built-in iterators, but don't explain why my custom iterator behaves differently.


Solution

  • The problem is that the custom iterator is initialising inside the __iter__ method. Even though i2 = iter(CustomIterator()) includes an explicit call to iter, the sum function (and also min, max, for, etc) will still call i2.__iter__() again and reset i2.

    There's a bunch of tutorials out there on "how to make Python iterators", and about half of them say something like "to make an iterator, you just have to define iter and next methods". While this is technically correct as per the documentation, it will get you into trouble sometimes. In many cases you'll also want a separate __init__ method to initialise the iterator.

    So to fix this problem, redefine CustomIterator as:

    class CustomIterator:
      def __init__(self):
        self.n=0
    
      def __iter__(self):
        return self
    
      def __next__(self):
        self.n += 1
        if self.n > 3:
          raise StopIteration
        return self.n
    
    i1 = iter([1,2,3])
    i2 = CustomIterator() ### iter(...) is not needed here (but won't do any harm either)
    
    

    Then init is called once and once only on creating a new iterator, and repeated calls to iter won't reset the iterator.