For reasons which are not relevant I am combining some data structures in a certain way, whilst also replacing the Python 2.7's default dict
with OrderedDict
. The data structures use tuples as keys in dictionaries. Please ignore those details (the replacement of the dict
type is not useful below, but it is in the real code).
import __builtin__
import collections
import contextlib
import itertools
def combine(config_a, config_b):
return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
@contextlib.contextmanager
def dict_as_ordereddict():
dict_orig = __builtin__.dict
try:
__builtin__.dict = collections.OrderedDict
yield
finally:
__builtin__.dict = dict_orig
This works as expected initially (dict
can take non-string keyword arguments as a special case):
print 'one level nesting'
with dict_as_ordereddict():
result = combine(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
)
print list(result)
print
Output:
one level nesting
[{(0, 1): 'a', (4, 5): 'c', (2, 3): 'b', (6, 7): 'd'}]
However, when nesting calls to the combine
generator expression, it can be seen that the dict
reference is treated as OrderedDict
, lacking the special behaviour of dict
to use tuples as keyword arguments:
print 'two level nesting'
with dict_as_ordereddict():
result = combine(combine(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
),
[{(8, 9): 'e', (10, 11): 'f'}]
)
print list(result)
print
Output:
two level nesting
Traceback (most recent call last):
File "test.py", line 36, in <module>
[{(8, 9): 'e', (10, 11): 'f'}]
File "test.py", line 8, in combine
return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
File "test.py", line 8, in <genexpr>
return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
TypeError: __init__() keywords must be strings
Furthermore, implementing via yield
instead of a generator expression fixes the problem:
def combine_yield(config_a, config_b):
for first, second in itertools.product(config_a, config_b):
yield dict(first, **second)
print 'two level nesting, yield'
with dict_as_ordereddict():
result = combine_yield(combine_yield(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
),
[{(8, 9): 'e', (10, 11): 'f'}]
)
print list(result)
print
Output:
two level nesting, yield
[{(0, 1): 'a', (8, 9): 'e', (2, 3): 'b', (4, 5): 'c', (6, 7): 'd', (10, 11): 'f'}]
Questions:
yield
-based version work?Before going into the details note the following: itertools.product
evaluates the iterator arguments in order to compute the product. This can be seen from the equivalent Python implementation in the docs (the first line is relevant):
def product(*args, **kwds):
pools = map(tuple, args) * kwds.get('repeat', 1)
...
You can also try this with a custom class and a short test script:
import itertools
class Test:
def __init__(self):
self.x = 0
def __iter__(self):
return self
def next(self):
print('next item requested')
if self.x < 5:
self.x += 1
return self.x
raise StopIteration()
t = Test()
itertools.product(t, t)
Creating the itertools.product
object will show in the output that all the iterators items are immediately requested.
This means, as soon as you call itertools.product
the iterator arguments are evaluated. This is important because in the first case the arguments are just two lists and so there's no problem. Then you evaluate the final result
via list(result
after the context manager dict_as_ordereddict
has returned and so all calls to dict
will be resolved as the normal builtin dict
.
Now for the second example the inner call to combine
works still fine, now returning a generator expression which is then used as one of the arguments to the second combine
's call to itertools.product
. As we've seen above these arguments are immediately evaluated and so the generator object is asked to generate its values. In order to do so, it needs to resolve dict
. However now we're still inside the context manager dict_as_ordereddict
and for that reason dict
will be resolved as OrderedDict
which doesn't accept non-string keys for keyword arguments.
It is important to notice here that the first version which uses return
needs to create the generator object in order to return it. That involves creating the itertools.product
object. That means this version is as lazy as itertools.product
.
Now to the question why the yield
version works. By using yield
, invoking the function will return a generator. Now this is a truly lazy version in the sense that execution of the function body doesn't start until items are requested. This means neither the inner nor the outer call to convert
will start executing the function body and thus invoking itertools.product
until the items are requested via list(result)
. You can check that by putting an additional print statement inside that function and right behind the context manager:
def combine(config_a, config_b):
print 'start'
# return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
for first, second in itertools.product(config_a, config_b):
yield dict(first, **second)
with dict_as_ordereddict():
result = combine(combine(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
),
[{(8, 9): 'e', (10, 11): 'f'}]
)
print 'end of context manager'
print list(result)
print
With the yield
version we'll notice that it prints the following:
end of context manager
start
start
I.e. the generators are started only when the results are requested via list(result)
. This is different from the return
version (uncomment in the above code). Now you'll see
start
start
and before the end of the context manager is reached the error is already raised.
On a side note, in order for your code to work, the replacement of dict
needs to be ineffective (and it is for the first version), so I don't see why you would use that context manager at all. Secondly, dict
literals are not ordered in Python 2, and neither are keyword arguments so that also defeats the purpose of using OrderedDict
. Also note that in Python 3 that non-string keyword arguments behavior of dict
has been removed and the clean way to update dictionaries of any keys is to use dict.update
.