Search code examples
pythongenerator

How does range object allow for multiple iterations while generators object do not?


I am wondering about the differences between range and generator in Python.

I have done some research and found some useful answers like this one and this one which explain the differences between these objects, despite the fact that they may return similar things.

One of the differences I wanted to explore is that the range object can be called multiple times while the generator object cannot. To demonstrate this more clearly, to myself, I considered the following code:

def my_range(first=0, last=3, step=1):
    number = first
    while number < last:
        yield number
        number+=step

a = range(0,3)
b = my_range()

for i in a:
    print(i)
print("First")
for i in a:
    print(i)
print("Second")
for i in b:
    print(i)
print("Third")
for i in b:
    print(i)
print("Fourth")

Which outputs:

0
1
2
First
0
1
2
Second
0
1
2
Third
Fourth

It is clear to me from this that the generator gets "used up" while range does not. But I am having trouble finding exactly where in the source code (as well as where the source code is itself) this sort of behavior is defined. I am not sure where to start to find and interpret the code that dictates that a range object can be used multiple times, but a generator object cannot.

I would like help with finding and understanding how property like how many times an object can be iterated over is implemented in Python's source code.


Solution

  • A range object is a plain iterable sequence, while a generator is also an iterator.

    The difference between the two is that an iterable is used to generate iterators which store the iteration state. This can be seen if we play around with range, its iterators, and next a bit.

    First, we can see that range is not an iterator if we try to call next on it

    In [1]: next(range(0))
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    Input In [1], in <module>
    ----> 1 next(range(0))
    
    TypeError: 'range' object is not an iterator
    

    We can create the iterator ourselves by calling the iter builtin, we can see that this gives us a different iterator type when called on our range.

    In [2]: iter(range(0))
    Out[2]: <range_iterator at 0x28573eabc90>
    

    Each of the iterators created by the iterable will store its own iteration state (say, an index into the range object that's incremented every time it's advanced) so we can use them independently

    In [3]: range_obj = range(10)
    
    In [4]: iterator_1 = iter(range_obj)
    
    In [5]: iterator_2 = iter(range_obj)
    
    In [6]: [next(iterator_1) for _ in range(5)]  # advance iterator_1 5 times
    Out[6]: [0, 1, 2, 3, 4]
    
    In [7]: next(iterator_2)  # left unchanged, fetches first item from range_obj
    Out[7]: 0
    

    Python also creates iterators by itself when a for loop is used, which can be seen if we take a look at instructions generator for it

    In [8]: dis.dis("for a in b: ...")
      1           0 LOAD_NAME                0 (b)
                  2 GET_ITER
            >>    4 FOR_ITER                 4 (to 10)
                  6 STORE_NAME               1 (a)
                  8 JUMP_ABSOLUTE            4
            >>   10 LOAD_CONST               0 (None)
                 12 RETURN_VALUE
    

    Here, the GET_ITER is the same as doing iter(b).

    Now with the generator, after creating it by calling the generator function, Python gives you an iterator directly, as there's no iterable object above it to be generated from. Calling the generator function could be seen as calling iter(...), but passing it everything is left up to the user as arguments to the function instead of fetching the information from an object it was created by.