Here is an simple example that gets min, max, and avg values from a list.
The two functions below have same result.
I want to know the difference between these two functions.
And why use itertools.tee()
?
What advantage does it provide?
from statistics import median
from itertools import tee
purchases = [1, 2, 3, 4, 5]
def process_purchases(purchases):
min_, max_, avg = tee(purchases, 3)
return min(min_), max(max_), median(avg)
def _process_purchases(purchases):
return min(purchases), max(purchases), median(purchases)
def main():
stats = process_purchases(purchases=purchases)
print("Result:", stats)
stats = _process_purchases(purchases=purchases)
print("Result:", stats)
if __name__ == '__main__':
main()
Iterators can only be iterated once in python. After that they are "exhausted" and don't return more values.
You can see this in functions like map()
, zip()
, filter()
and many others:
purchases = [1, 2, 3, 4, 5]
double = map(lambda n: n*2, purchases)
print(list(double))
# [2, 4, 6, 8, 10]
print(list(double))
# [] <-- can't use it twice
You can see the difference between your two functions if you pass them an iterator, such as the return value from map()
. In this case _process_purchases()
fails because min()
exhausts the iterator and leaves no values for max()
and median()
.
tee()
takes an iterator and gives you two or more, allowing you to use the iterator passed into the function more than once:
from itertools import tee
from statistics import median
purchases = [1, 2, 3, 4, 5]
def process_purchases(purchases):
min_, max_, avg = tee(purchases, 3)
return min(min_), max(max_), median(avg)
def _process_purchases(purchases):
return min(purchases), max(purchases), median(purchases)
double = map(lambda n: n*2, purchases)
_process_purchases(double)
# ValueError: max() arg is an empty sequence
double = map(lambda n: n*2, purchases)
process_purchases(double)
# (2, 10, 6)