Search code examples
python-3.xgeneratorpython-3.8

How to create a Python nested generator pipeline without a function call?


I have the following snippet of code.

x = (w for w in words if len(w) == 5)

def mk_gen(x, idx, val):
    return (w for w in x if w[idx] == val)

pattern = 'he_l_'
for idx, val in enumerate(pattern):
    if val == '_':
        continue
    x = mk_gen(x, idx, val) # Method 1
    # x = (w for w in x if w[idx] == val) # Method 2

print((list(x)))
# Method 1 -> ['heald', 'heals', 'heels', 'heild', 'heily', 'heils', 'helly', 'hello', 'hells', 'herls']
# Method 2 -> []


Words here is a list of English dictionary words.

Method 1 gives the expected output.

Method 2 however does not despite the fact that Method 1 is doing the same as Method 2 but calls a function that returns it rather than directly assigning the new generator.

Why does this happen?

I tried another way of making a new generator, using function call.

def mk_gen_1(x, idx, val):
    for w in x:
        if w[idx] == val:
            yield w

This one also worked but the original Method 2 doesn't.

Another which I found in Pythonic way to chain python generator function to form a pipeline doesn't work.

gen_2_steps = [(lambda g:(w for w in g if w[idx] == val)) for idx, val in enumerate(pattern) if val != '_']
x = reduce(lambda g, f: f(g), [x, *gen_2_steps]) # Method 3
print(list(x))

This one works, but it is not what I intend to do. I intend to create generators by using a for loop and without calling another function.

# Works
f2_1 = lambda g:(w for w in g if w[0] == 'h')
x = f2_1(x)
f2_2 = lambda g:(w for w in g if w[1] == 'e')
x = f2_2(x)
f2_3 = lambda g:(w for w in g if w[3] == 'l')
x = f2_3(x)
print(len(list(x)))

Solution

  • There are two different sets of variables used by Methods 1 and 2.

    Method 1 uses the local names idx and val defined by mk_gen; it basically defines a closure, where the values passed to mk_gen are remembered by the generator even after mk_gen returns.

    Method 2 uses the names idx and val that are defined (and redefined) by the for loop, because the generator expression is in the same scope as those variables.