Yield slower than return in some cases?

I'm trying to learn use cases for yield vs return. Here, I'm cleaning up a dictionary. But it appears return is faster here. Is it the case that yield is mostly faster only when we don't need to run through all iterations 0 to imax?

Solution

TLDR:

The differences in timings you see is due to the difference in performance of building a dictionary item by item vs building a list of tuples then casting that to a dictionary. NOT as a result of some performance difference with return vs yield.

Details:

As you have implemented and observed with your two strategies, the one that returns is faster than the one that yeilds but that might also be as a result of the differences in your strategies rather than in return vs yeild.

Your return code builds a dictionary piece by piece and then returns it while your yield strategy returns tuples that you gather into a list and cast that to a dictionary.

What happens if we compare the timings of returning a list of tuples vs yeilding tuples into a list? What we will find is that the performance is essentially the same.

First let's determine 3 methods that will ultimately produce the same results (your dictionary)

First, let's build some data to test with:

import random

## --------------------------
## Some random input data
## --------------------------
feature_dict = {
    f"{'enable' if i%2 else 'disable'}_{i}": random.choice([True, False])
    for i in range(1000)
}
## --------------------------

Next, our three test methods.

## --------------------------
## Your "return" strategy
## --------------------------
def reverse_disable_to_enable_return(dic):
    new_dic = {}
    for key, val in dic.items():
        if "enabl" in key:
            new_dic[key] = val
        if "disabl" in key:
            modified_key = key.replace("disable", "enable")
            if val == False:
                new_dic[modified_key] = True
            elif val == True:
                new_dic[modified_key] = False
    return new_dic
## --------------------------

## --------------------------
## Your "yield" strategy (requires cast to dict for compatibility with return)
## --------------------------
def reverse_disable_to_enable_yield(dic):
    for key, val in dic.items():
        if "enabl" in key:
            yield key, val
        if "disabl" in key:
            modified_key = key.replace("disable", "enable")
            if val == False:
                yield modified_key, True
            elif val == True:
                yield modified_key, False
## --------------------------

## --------------------------
## Your "return" strategy modified to return a list to match the yield
## --------------------------
def reverse_disable_to_enable_return_apples(dic):
    new_list = []
    for key, val in dic.items():
        if "enabl" in key:
            new_list.append((key, val))
        if "disabl" in key:
            modified_key = key.replace("disable", "enable")
            if val == False:
                new_list.append((modified_key, True))
            elif val == True:
                new_list.append((modified_key, False))
    return new_list
## --------------------------

Now, lets validate that these are essentially the same from a result perspective:

## --------------------------
## Do these produce the same result?
## --------------------------
a = reverse_disable_to_enable_return(feature_dict)
b = dict(reverse_disable_to_enable_return_apples(feature_dict))
c = dict(reverse_disable_to_enable_yield(feature_dict))

print(a == feature_dict)
print(a == b)
print(a == c)
## --------------------------

As we hoped, this tells us:

False
True
True

Now, what about timing?

Let's establish the base setup context:

import timeit

setup = '''
import random
feature_dict = {
    f"{'enable' if i%2 else 'disable'}_{i}": random.choice([True, False])
    for i in range(1000)
}

def reverse_disable_to_enable_return(dic):
    new_dic = {}
    for key, val in dic.items():
        if "enabl" in key:
            new_dic[key] = val
        if "disabl" in key:
            modified_key = key.replace("disable", "enable")
            if val == False:
                new_dic[modified_key] = True
            elif val == True:
                new_dic[modified_key] = False
    return new_dic

def reverse_disable_to_enable_return_apples(dic):
    new_list = []
    for key, val in dic.items():
        if "enabl" in key:
            new_list.append((key, val))
        if "disabl" in key:
            modified_key = key.replace("disable", "enable")
            if val == False:
                new_list.append((modified_key, True))
            elif val == True:
                new_list.append((modified_key, False))
    return new_list

def reverse_disable_to_enable_yield(dic):
    for key, val in dic.items():
        if "enabl" in key:
            yield key, val
        if "disabl" in key:
            modified_key = key.replace("disable", "enable")
            if val == False:
                yield modified_key, True
            elif val == True:
                yield modified_key, False
'''

now we are ready to do some timing....

Let's try:

timings_a = timeit.timeit("reverse_disable_to_enable_return(feature_dict)", setup=setup, number=10_000)
print(f"reverse_disable_to_enable_return: {timings_a}")

timings_b = timeit.timeit("dict(reverse_disable_to_enable_yield(feature_dict))", setup=setup, number=10_000)
print(f"reverse_disable_to_enable_yield: {timings_b}")

On my laptop this gives:

reverse_disable_to_enable_return: 2.30
reverse_disable_to_enable_yield: 2.71

Confirming what you observe that yield is apparently slower than return..

BUT, remember, this is not really an apples to apple test.

Let's try our 3rd method

timings_c = timeit.timeit("dict(reverse_disable_to_enable_return_apples(feature_dict))", setup=setup, number=10_000)
print(f"reverse_disable_to_enable_return_apples: {timings_c}")

giving us a much closer match to our yield case:

reverse_disable_to_enable_return_apples: 2.9009995

In fact, lets take the cast to dict() out and look at returning a list of tuples vs yeilding tuples to build a list...

timings_b = timeit.timeit("list(reverse_disable_to_enable_yield(feature_dict))", setup=setup, number=10_000)
print(f"reverse_disable_to_enable_yield: {timings_b}")

timings_c = timeit.timeit("reverse_disable_to_enable_return_apples(feature_dict)", setup=setup, number=10_000)
print(f"reverse_disable_to_enable_return_apples: {timings_c}")

Now we get:

reverse_disable_to_enable_yield: 2.13
reverse_disable_to_enable_return_apples: 2.13

Showing us that over 10k calls the time to build and return a list of tuples is essentially identical to the time to yield those same tuples and build a list. As we might expect.

Summary:

The differences in timings you see is due to the difference in performance of building a dictionary item by item vs building a list of tuples then casting that to a dictionary. NOT as a result of some performance difference with return vs yield.