I'm trying to learn use cases for yield vs return. Here, I'm cleaning up a dictionary. But it appears return is faster here. Is it the case that yield is mostly faster only when we don't need to run through all iterations 0 to imax?
TLDR:
The differences in timings you see is due to the difference in performance of building a dictionary item by item vs building a list of tuples then casting that to a dictionary. NOT as a result of some performance difference with return vs yield.
Details:
As you have implemented and observed with your two strategies, the one that returns
is faster than the one that yeilds
but that might also be as a result of the differences in your strategies rather than in return
vs yeild
.
Your return
code builds a dictionary piece by piece and then returns it while your yield
strategy returns tuples that you gather into a list and cast that to a dictionary.
What happens if we compare the timings of returning a list of tuples vs yeilding tuples into a list? What we will find is that the performance is essentially the same.
First let's determine 3 methods that will ultimately produce the same results (your dictionary)
First, let's build some data to test with:
import random
## --------------------------
## Some random input data
## --------------------------
feature_dict = {
f"{'enable' if i%2 else 'disable'}_{i}": random.choice([True, False])
for i in range(1000)
}
## --------------------------
Next, our three test methods.
## --------------------------
## Your "return" strategy
## --------------------------
def reverse_disable_to_enable_return(dic):
new_dic = {}
for key, val in dic.items():
if "enabl" in key:
new_dic[key] = val
if "disabl" in key:
modified_key = key.replace("disable", "enable")
if val == False:
new_dic[modified_key] = True
elif val == True:
new_dic[modified_key] = False
return new_dic
## --------------------------
## --------------------------
## Your "yield" strategy (requires cast to dict for compatibility with return)
## --------------------------
def reverse_disable_to_enable_yield(dic):
for key, val in dic.items():
if "enabl" in key:
yield key, val
if "disabl" in key:
modified_key = key.replace("disable", "enable")
if val == False:
yield modified_key, True
elif val == True:
yield modified_key, False
## --------------------------
## --------------------------
## Your "return" strategy modified to return a list to match the yield
## --------------------------
def reverse_disable_to_enable_return_apples(dic):
new_list = []
for key, val in dic.items():
if "enabl" in key:
new_list.append((key, val))
if "disabl" in key:
modified_key = key.replace("disable", "enable")
if val == False:
new_list.append((modified_key, True))
elif val == True:
new_list.append((modified_key, False))
return new_list
## --------------------------
Now, lets validate that these are essentially the same from a result perspective:
## --------------------------
## Do these produce the same result?
## --------------------------
a = reverse_disable_to_enable_return(feature_dict)
b = dict(reverse_disable_to_enable_return_apples(feature_dict))
c = dict(reverse_disable_to_enable_yield(feature_dict))
print(a == feature_dict)
print(a == b)
print(a == c)
## --------------------------
As we hoped, this tells us:
False
True
True
Now, what about timing?
Let's establish the base setup context:
import timeit
setup = '''
import random
feature_dict = {
f"{'enable' if i%2 else 'disable'}_{i}": random.choice([True, False])
for i in range(1000)
}
def reverse_disable_to_enable_return(dic):
new_dic = {}
for key, val in dic.items():
if "enabl" in key:
new_dic[key] = val
if "disabl" in key:
modified_key = key.replace("disable", "enable")
if val == False:
new_dic[modified_key] = True
elif val == True:
new_dic[modified_key] = False
return new_dic
def reverse_disable_to_enable_return_apples(dic):
new_list = []
for key, val in dic.items():
if "enabl" in key:
new_list.append((key, val))
if "disabl" in key:
modified_key = key.replace("disable", "enable")
if val == False:
new_list.append((modified_key, True))
elif val == True:
new_list.append((modified_key, False))
return new_list
def reverse_disable_to_enable_yield(dic):
for key, val in dic.items():
if "enabl" in key:
yield key, val
if "disabl" in key:
modified_key = key.replace("disable", "enable")
if val == False:
yield modified_key, True
elif val == True:
yield modified_key, False
'''
now we are ready to do some timing....
Let's try:
timings_a = timeit.timeit("reverse_disable_to_enable_return(feature_dict)", setup=setup, number=10_000)
print(f"reverse_disable_to_enable_return: {timings_a}")
timings_b = timeit.timeit("dict(reverse_disable_to_enable_yield(feature_dict))", setup=setup, number=10_000)
print(f"reverse_disable_to_enable_yield: {timings_b}")
On my laptop this gives:
reverse_disable_to_enable_return: 2.30
reverse_disable_to_enable_yield: 2.71
Confirming what you observe that yield
is apparently slower than return
..
BUT, remember, this is not really an apples to apple test.
Let's try our 3rd method
timings_c = timeit.timeit("dict(reverse_disable_to_enable_return_apples(feature_dict))", setup=setup, number=10_000)
print(f"reverse_disable_to_enable_return_apples: {timings_c}")
giving us a much closer match to our yield case:
reverse_disable_to_enable_return_apples: 2.9009995
In fact, lets take the cast to dict()
out and look at returning a list of tuples vs yeilding tuples to build a list...
timings_b = timeit.timeit("list(reverse_disable_to_enable_yield(feature_dict))", setup=setup, number=10_000)
print(f"reverse_disable_to_enable_yield: {timings_b}")
timings_c = timeit.timeit("reverse_disable_to_enable_return_apples(feature_dict)", setup=setup, number=10_000)
print(f"reverse_disable_to_enable_return_apples: {timings_c}")
Now we get:
reverse_disable_to_enable_yield: 2.13
reverse_disable_to_enable_return_apples: 2.13
Showing us that over 10k calls the time to build and return a list of tuples is essentially identical to the time to yield those same tuples and build a list. As we might expect.
Summary:
The differences in timings you see is due to the difference in performance of building a dictionary item by item vs building a list of tuples then casting that to a dictionary. NOT as a result of some performance difference with return vs yield.