tee() function from itertools library

Here is an simple example that gets min, max, and avg values from a list. The two functions below have same result. I want to know the difference between these two functions. And why use itertools.tee()? What advantage does it provide?

from statistics import median
from itertools import tee

purchases = [1, 2, 3, 4, 5]

def process_purchases(purchases):
    min_, max_, avg = tee(purchases, 3)
    return min(min_), max(max_), median(avg)

def _process_purchases(purchases):
    return min(purchases), max(purchases), median(purchases)

def main():
    stats = process_purchases(purchases=purchases)
    print("Result:", stats)
    stats = _process_purchases(purchases=purchases)
    print("Result:", stats)

if __name__ == '__main__':
    main()

Solution

Iterators can only be iterated once in python. After that they are "exhausted" and don't return more values.

You can see this in functions like map(), zip(), filter() and many others:

purchases = [1, 2, 3, 4, 5]

double = map(lambda n: n*2, purchases)

print(list(double))
# [2, 4, 6, 8, 10]

print(list(double))
# [] <-- can't use it twice

You can see the difference between your two functions if you pass them an iterator, such as the return value from map(). In this case _process_purchases() fails because min() exhausts the iterator and leaves no values for max() and median().

tee() takes an iterator and gives you two or more, allowing you to use the iterator passed into the function more than once:

from itertools import tee
from statistics import median

purchases = [1, 2, 3, 4, 5]

def process_purchases(purchases):
    min_, max_, avg = tee(purchases, 3)
    return min(min_), max(max_), median(avg)


def _process_purchases(purchases):
    return min(purchases), max(purchases), median(purchases)

double = map(lambda n: n*2, purchases)
_process_purchases(double)
# ValueError: max() arg is an empty sequence

double = map(lambda n: n*2, purchases)
process_purchases(double)
# (2, 10, 6)