Generator expressions vs generator functions and surprisingly eager evaluation

For reasons which are not relevant I am combining some data structures in a certain way, whilst also replacing the Python 2.7's default dict with OrderedDict. The data structures use tuples as keys in dictionaries. Please ignore those details (the replacement of the dict type is not useful below, but it is in the real code).

import __builtin__
import collections
import contextlib
import itertools


def combine(config_a, config_b):
    return (dict(first, **second) for first, second in itertools.product(config_a, config_b))


@contextlib.contextmanager
def dict_as_ordereddict():
    dict_orig = __builtin__.dict
    try:
        __builtin__.dict = collections.OrderedDict
        yield
    finally:
        __builtin__.dict = dict_orig

This works as expected initially (dict can take non-string keyword arguments as a special case):

print 'one level nesting'
with dict_as_ordereddict():
    result = combine(
        [{(0, 1): 'a', (2, 3): 'b'}],
        [{(4, 5): 'c', (6, 7): 'd'}]
    )
print list(result)
print

Output:

one level nesting
[{(0, 1): 'a', (4, 5): 'c', (2, 3): 'b', (6, 7): 'd'}]

However, when nesting calls to the combine generator expression, it can be seen that the dict reference is treated as OrderedDict, lacking the special behaviour of dict to use tuples as keyword arguments:

print 'two level nesting'
with dict_as_ordereddict():
    result = combine(combine(
        [{(0, 1): 'a', (2, 3): 'b'}],
        [{(4, 5): 'c', (6, 7): 'd'}]
    ),
        [{(8, 9): 'e', (10, 11): 'f'}]
    )
print list(result)
print

Output:

two level nesting
Traceback (most recent call last):
  File "test.py", line 36, in <module>
    [{(8, 9): 'e', (10, 11): 'f'}]
  File "test.py", line 8, in combine
    return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
  File "test.py", line 8, in <genexpr>
    return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
TypeError: __init__() keywords must be strings

Furthermore, implementing via yield instead of a generator expression fixes the problem:

def combine_yield(config_a, config_b):
    for first, second in itertools.product(config_a, config_b):
        yield dict(first, **second)


print 'two level nesting, yield'
with dict_as_ordereddict():
    result = combine_yield(combine_yield(
        [{(0, 1): 'a', (2, 3): 'b'}],
        [{(4, 5): 'c', (6, 7): 'd'}]
    ),
        [{(8, 9): 'e', (10, 11): 'f'}]
    )
print list(result)
print

Output:

two level nesting, yield
[{(0, 1): 'a', (8, 9): 'e', (2, 3): 'b', (4, 5): 'c', (6, 7): 'd', (10, 11): 'f'}]

Questions:

Why does some item (only the first?) from the generator expression get evaluated before required in the second example, or what is it required for?
Why is it not evaluated in the first example? I actually expected this behaviour in both.
Why does the yield-based version work?

Solution

Before going into the details note the following: itertools.product evaluates the iterator arguments in order to compute the product. This can be seen from the equivalent Python implementation in the docs (the first line is relevant):

def product(*args, **kwds):
    pools = map(tuple, args) * kwds.get('repeat', 1)
    ...

You can also try this with a custom class and a short test script:

import itertools


class Test:
    def __init__(self):
        self.x = 0

    def __iter__(self):
        return self

    def next(self):
        print('next item requested')
        if self.x < 5:
            self.x += 1
            return self.x
        raise StopIteration()


t = Test()
itertools.product(t, t)

Creating the itertools.product object will show in the output that all the iterators items are immediately requested.

This means, as soon as you call itertools.product the iterator arguments are evaluated. This is important because in the first case the arguments are just two lists and so there's no problem. Then you evaluate the final result via list(result after the context manager dict_as_ordereddict has returned and so all calls to dict will be resolved as the normal builtin dict.

Now for the second example the inner call to combine works still fine, now returning a generator expression which is then used as one of the arguments to the second combine's call to itertools.product. As we've seen above these arguments are immediately evaluated and so the generator object is asked to generate its values. In order to do so, it needs to resolve dict. However now we're still inside the context manager dict_as_ordereddict and for that reason dict will be resolved as OrderedDict which doesn't accept non-string keys for keyword arguments.

It is important to notice here that the first version which uses return needs to create the generator object in order to return it. That involves creating the itertools.product object. That means this version is as lazy as itertools.product.

Now to the question why the yield version works. By using yield, invoking the function will return a generator. Now this is a truly lazy version in the sense that execution of the function body doesn't start until items are requested. This means neither the inner nor the outer call to convert will start executing the function body and thus invoking itertools.product until the items are requested via list(result). You can check that by putting an additional print statement inside that function and right behind the context manager:

def combine(config_a, config_b):
    print 'start'
    # return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
    for first, second in itertools.product(config_a, config_b):
        yield dict(first, **second)

with dict_as_ordereddict():
    result = combine(combine(
        [{(0, 1): 'a', (2, 3): 'b'}],
        [{(4, 5): 'c', (6, 7): 'd'}]
    ),
        [{(8, 9): 'e', (10, 11): 'f'}]
    )
print 'end of context manager'
print list(result)
print

With the yield version we'll notice that it prints the following:

end of context manager
start
start

I.e. the generators are started only when the results are requested via list(result). This is different from the return version (uncomment in the above code). Now you'll see

start
start

and before the end of the context manager is reached the error is already raised.

On a side note, in order for your code to work, the replacement of dict needs to be ineffective (and it is for the first version), so I don't see why you would use that context manager at all. Secondly, dict literals are not ordered in Python 2, and neither are keyword arguments so that also defeats the purpose of using OrderedDict. Also note that in Python 3 that non-string keyword arguments behavior of dict has been removed and the clean way to update dictionaries of any keys is to use dict.update.