Search code examples
pythonrecursiongeneratoryieldyield-from

Mixing yield and return. `yield [cand]; return` vs `return [[cand]]`. Why do they lead to different output?


Why does

yield [cand]
return

lead to different output/behavior than

return [[cand]]

Minimal viable example

  • uses recursion
  • the output of the version using yield [1]; return is different than the output of the version using return [[1]]
def foo(i):
    if i != 1:
        yield [1]
        return
    yield from foo(i-1)    

def bar(i):
    if i != 1:  
        return [[1]]
    yield from bar(i-1)

print(list(foo(1))) # [[1]]
print(list(bar(1))) # []

Min viable counter example

  • does not use recurion
  • the output of the version using yield [1]; return is the same as the output of the version using return [[1]]
def foo():
    yield [1]
    return

def foofoo():
    yield from foo()

def bar():
    return [[1]]

def barbar():
    yield from bar()

print(list(foofoo())) # [[1]]
print(list(barbar())) # [[1]]

Full context

I'm solving Leetcode #39: Combination Sum and was wondering why one solution works, but not the other:

Working solution

from functools import cache # requires Python 3.9+

class Solution:
    def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
        @cache
        def helper(targ, i=0):
            if i == N or targ < (cand := candidates[i]):
                return
            if targ == cand:
                yield [cand]
                return
            for comb in helper(targ - cand, i):
                yield comb + [cand]
            yield from helper(targ, i+1)
        
        N = len(candidates)
        candidates.sort()
        yield from helper(target)

Non-working solution

from functools import cache # requires Python 3.9+

class Solution:
    def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
        @cache
        def helper(targ, i=0):
            if i == N or targ < (cand := candidates[i]):
                return
            if targ == cand:
                return [[cand]]
            for comb in helper(targ - cand, i):
                yield comb + [cand]
            yield from helper(targ, i+1)
        
        N = len(candidates)
        candidates.sort()
        yield from helper(target)

Output

On the following input

candidates = [2,3,6,7]
target = 7
print(Solution().combinationSum(candidates, target))

the working solution correctly prints

[[3,2,2],[7]]

while the non-working solution prints

[]

I'm wondering why yield [cand]; return works, but return [[cand]] doesn't.


Solution

  • In a generator function, return just defines the value associated with the StopIteration exception implicitly raised to indicate an iterator is exhausted. It's not produced during iteration, and most iterating constructs (e.g. for loops) intentionally ignore the StopIteration exception (it means the loop is over, you don't care if someone attached random garbage to a message that just means "we're done").

    For example, try:

    >>> def foo():
    ...     yield 'onlyvalue'  # Existence of yield keyword makes this a generator
    ...     return 'returnvalue'
    ...
    
    >>> f = foo()  # Makes a generator object, stores it in f
    
    >>> next(f)  # Pull one value from generator
    'onlyvalue'
    
    >>> next(f)  # There is no other yielded value, so this hits the return; iteration over
    --------------------------------------------------------------------------
    StopIteration                            Traceback (most recent call last)
    ...
    StopIteration: 'returnvalue'
    

    As you can see, your return value does get "returned" in a sense (it's not completely discarded), but it's never seen by anything iterating normally, so it's largely useless. Outside of rare cases involving using generators as coroutines (where you're using .send() and .throw() on instances of the generator and manually advancing it with next(genobj)), the return value of a generator won't be seen.

    In short, you have to pick one:

    1. Use yield anywhere in a function, and it's a generator (whether or not the code path of a particular call ever reaches a yield) and return just ends generation (while maybe hiding some data in the StopIteration exception). No matter what you do, calling the generator function "returns" a new generator object (which you can loop over until exhausted), it can never return a raw value computed inside the generator function (which doesn't even begin running until you loop over it at least once).
    2. Don't use yield, and return works as expected (because it's not a generator function).

    As an example to explain what happens to the return value in normal looping constructs, this is what for x in gen(): effectively expands to a C optimized version of:

    __unnamed_iterator = iter(gen())
    while True:
        try:
            x = next(__unnamed_iterator)
        except StopIteration:  # StopIteration caught here without inspecting it
            break              # Loop ends, StopIteration exception cleaned even from sys.exc_info() to avoid possible reference cycles
    
        # body of loop goes here
    
    # Outside of loop, there is no StopIteration object left
    

    As you can see, the expanded form of the for loop has to look for a StopIteration to indicate the loop is over, but it doesn't use it. And for anything that's not a generator, the StopIteration never has any associated values; the for loop has no way to report them even if it did (it has to end the loop when it's told iteration is over, and the arguments to StopIteration are explicitly not part of the values iterated anyway). Anything else that consumes the generator (e.g. calling list on it) is doing roughly the same thing as the for loop, ignoring the StopIteration in the same way; nothing except code that specifically expects generators (as opposed to more generalized iterables and iterators) will ever bother to inspect the StopIteration object (at the C layer, there are optimizations that StopIteration objects aren't even produced by most iterators; they return NULL and leave the set exception empty, which all iterator protocol using things know is equivalent to returning NULL and setting a StopIteration object, so for anything but a generator, there isn't even an exception to inspect much of the time).