Search code examples
pythonpython-itertoolstee

tee() function from itertools library


Here is an simple example that gets min, max, and avg values from a list. The two functions below have same result. I want to know the difference between these two functions. And why use itertools.tee()? What advantage does it provide?

from statistics import median
from itertools import tee

purchases = [1, 2, 3, 4, 5]

def process_purchases(purchases):
    min_, max_, avg = tee(purchases, 3)
    return min(min_), max(max_), median(avg)

def _process_purchases(purchases):
    return min(purchases), max(purchases), median(purchases)

def main():
    stats = process_purchases(purchases=purchases)
    print("Result:", stats)
    stats = _process_purchases(purchases=purchases)
    print("Result:", stats)

if __name__ == '__main__':
    main()

Solution

  • Iterators can only be iterated once in python. After that they are "exhausted" and don't return more values.

    You can see this in functions like map(), zip(), filter() and many others:

    purchases = [1, 2, 3, 4, 5]
    
    double = map(lambda n: n*2, purchases)
    
    print(list(double))
    # [2, 4, 6, 8, 10]
    
    print(list(double))
    # [] <-- can't use it twice
    

    You can see the difference between your two functions if you pass them an iterator, such as the return value from map(). In this case _process_purchases() fails because min() exhausts the iterator and leaves no values for max() and median().

    tee() takes an iterator and gives you two or more, allowing you to use the iterator passed into the function more than once:

    from itertools import tee
    from statistics import median
    
    purchases = [1, 2, 3, 4, 5]
    
    def process_purchases(purchases):
        min_, max_, avg = tee(purchases, 3)
        return min(min_), max(max_), median(avg)
    
    
    def _process_purchases(purchases):
        return min(purchases), max(purchases), median(purchases)
    
    double = map(lambda n: n*2, purchases)
    _process_purchases(double)
    # ValueError: max() arg is an empty sequence
    
    double = map(lambda n: n*2, purchases)
    process_purchases(double)
    # (2, 10, 6)