Search code examples
pythonpython-itertoolstee

Why does a loop behave like this, (in conjunction with tee)?


I'm attempting to iterate over pairs of combinations.

Although I have figured out a better way of doing this, from both a conceptual and pragmatic perspective, this was my first impulse, and I'm wondering why it didn't work.

gen = itertools.combinations(range(1, 6), 3)
for i in gen:
  gen, gencopy = itertools.tee(gen)
  for j in gencopy:
    print(i, j)

Gives the following output:

(1, 2, 3) (1, 2, 4)
(1, 2, 3) (1, 2, 5)
(1, 2, 3) (1, 3, 4)
(1, 2, 3) (1, 3, 5)
(1, 2, 3) (1, 4, 5)
(1, 2, 3) (2, 3, 4)
(1, 2, 3) (2, 3, 5)
(1, 2, 3) (2, 4, 5)
(1, 2, 3) (3, 4, 5)

Which means that only one of the is is iterated across.

However if I change the tee line to:

_, gencopy = itertools.tee(gen)

I get the full set of expected pairs.

(Note: I have since figured out that the best way to perform this is to simply feed the generator back through itertools.combinations to get back combinatorical pairs and avoid the performance issues that the documentation claims to be present with tee. However, I'm curious about the behavior of the for loop and why changing the generator in this manner is causing it to bail early.)


Solution

  • from the documentation:

    Once tee() has made a split, the original iterable should not be used anywhere else; otherwise, the iterable could get advanced without the tee objects being informed.

    The idea of tee when using more than 1 iterator in output is to share the objects between iterators (to each one is "consuming" the original list).

    This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

    Exactly what happens in your case: the inner loop consumes all the data, and the outer loop exits at once.

    Workaround that the documentation suggests:

    gen = list(itertools.combinations(range(1, 6), 3))
    for i in gen:
      for j in gen:
        print(i, j)
    

    but of course this may have a high memory footprint, since you "kill" the generator feature right from the start. So your idea of using combinations instead of a double loop is probably best.

    A related Q&A: How to clone a Python generator object?