Search code examples
pythonrecursiongeneratorsieve-of-eratosthenesgenerator-expression

Generator Recursion in Sieve of Eratosthenes Skips Steps


I ran into an issue with the sieve algorithm I have written here. I have tried to fix it for a total of about 10 hours now. I've looked around here for similar questions but I can't seem to find anyone who has had this issue. I'm relatively new to python, and after reading through a lot of generator documentation, I managed to write code that does work. However, I still don't know why my first attempts failed.

All I've come up with is that it seems that gen1 was not actually being emptied out during each successive sieve step. So, I tried alternating between the names gen1 and gen2, deleting each one to avoid this issue. That didn't work either.

I'd really appreciate some insight into this, and any suggestions for improving what I have now.

Here is the failed code:

def primes(n):
    "yields primes up to n. For use with large n"
    q = 0
    yield 2
    gen1 = (x for x in range(3,n,2))
    while q*q < n:
        q = next(gen1)
        gen1 = (x for x in gen1 if x%q != 0)
        yield q
    else:
        while 1:
            try:
                yield next(gen1)
            except:
                StopIteration
                break

Here is my current code:

import math
global gen1
global gen
def gen1(x):
    for i in range(3,x,2):
        yield i
def gen(generator,n):
    "Input generator and current starting 'index' for the generator"
    # Recursively defines new generator for sieve of Eratosthenes
    for i in range(n+1):
        predicate = next(generator)
        yield predicate
    for i in generator:
        if i % predicate != 0:
            yield i
def primes(n):
    yield 2
    a = gen1(n)
    for i in range(math.ceil(math.sqrt(n))):
        a = gen(a,i)
    yield from a

Solution

  • The basic problem is one of scope. In this line:

      gen1 = (x for x in gen1 if x%q != 0)
    

    the value of q used is not the value q was bound to at the time the generator expression was created, but rather the (ever-changing!) value of q at the time the generator expression is executing. This works the same way as for referencing non-local variables in any kind of nested function.

    To capture bindings at creation time, the obvious and easy way is to write a function instead, and pass the values you want it to use. For example, this rewrite is more Pythonic in that and several other respects:

    def primes(n):
        "yields primes up to n. For use with large n"
    
        def gen(gen1, q):
            for x in gen1:
                if x % q:
                    yield x
    
        q = 0
        yield 2
        gen1 = iter(range(3, n, 2))
        while q*q < n:
            q = next(gen1)
            gen1 = gen(gen1, q)
            yield q
        yield from gen1
    

    Now the values used in the body of gen() are precisely those passed to it, so it's obvious instead of a mystery ;-)