Search code examples
pythongeneratorpython-itertools

How to make itertools.tee() yield copies of iterated elements?


I am using itertools.tee for making copies of generators which yield dictionaries and pass the iterated dictionaries to functions that I don't have control about and that may modify the dictionaries. Thus, I would like to pass copies of the dictionaries to the functions, but all the tees yield just references to the same instance.

This is illustrated by the following simple example:

import itertools

original_list = [{'a':0,'b':1}, {'a':1,'b':2}]
tee1, tee2 = itertools.tee(original_list, 2)

for d1, d2 in zip(tee1, tee2):
    d1['a'] += 1
    print(d1)
    d2['a'] -= 1
    print(d2)

The output is:

{'b': 1, 'a': 1}
{'b': 1, 'a': 0}
{'b': 2, 'a': 2}
{'b': 2, 'a': 1}

While I would like to have:

{'b': 1, 'a': 1}
{'b': 1, 'a': -1}
{'b': 2, 'a': 2}
{'b': 2, 'a': 0}

Of course, in this example there would be many ways to work around this easily, but due to my specific use case, I need a version of itertools.tee that stores copies of all iterated objects in the queues of the tees instead of references to the original.

Is there a straightforward way to do this in Python or would I have to re-implement itertools.tee in a non-native and, hence, inefficient way?


Solution

  • There is no need to rework tee. Just wrap each generator produced by tee in a map(dict, ...) generator:

    try:
        # use iterative map from Python 3 if this is Python 2
        from future_builtins import map
    except ImportError:
        pass
    
    tee1, tee2 = itertools.tee(original_list, 2)
    tee1, tee2 = map(dict, tee1), map(dict, tee2)
    

    This automatically produces a shallow copy of each dictionary as you iterate.

    Demo (using Python 3.6):

    >>> import itertools
    >>> original_list = [{'a':0,'b':1}, {'a':1,'b':2}]
    >>> tee1, tee2 = itertools.tee(original_list, 2)
    >>> tee1, tee2 = map(dict, tee1), map(dict, tee2)
    >>> for d1, d2 in zip(tee1, tee2):
    ...     d1['a'] += 1
    ...     print(d1)
    ...     d2['a'] -= 1
    ...     print(d2)
    ...
    {'a': 1, 'b': 1}
    {'a': -1, 'b': 1}
    {'a': 2, 'b': 2}
    {'a': 0, 'b': 2}