Search code examples
pythonpython-itertools

implicit memory consumption with itertools.tee of generators


I'm using the answer(s) from here and here to check if my generator x is empty.

from itertools import tee
def my_generator():
    yield from range(100000000)
x = my_generator()
x, y = tee(x)
try:
    next(y)
except StopIteration:
    # x is empty do something
    quit()

What will happen to the elements extracted from x? can they be discarded ? or must be kept in memory for y?

# now consume x entirely
for z in x:
    print(x)
# how can y iterate over its objects ?
# will they have to reside in memory now ??

Solution

  • tl;dr - elements yielded from x will be kept in memory until they are yielded from y as well.

    When using itertools.tee, copies of all the yielded elements must be saved for the all the iterators returned from tee. In your case, those are x and y.

    This is done to allow full iteration through both x and y.

    If you look at the equivalent implementation in the Python docs for itertools.tee, you can see that all yielded values are saved until they are yielded from all the generators tee returned.

    In your case, you'll have to consume both x and y or have them (both) go out of scope and garbage-collected for the items to be released.