Search code examples
pythonprimessieve-algorithm

Sieve of Eratosthenes Understanding


I have this code I do not quite understand because I just started learning Python a week ago.

import numpy as np
import time

start_time=time.clock()

def Sieb(n):                           #Sieve
    Eins = np.ones(n, dtype=bool)      #Eins is just german for One
    Eins[0] = Eins[1] = False          #->I don't quite understand what
    for i in range(2, n):              #this one does.
        if Eins[i]:
            Eins[i*i::i] = False       #Does this make the ones = zero?
    return np.flatnonzero(Eins)


print Sieb(1000000)
print start_time

So, I understand the concept of the sieve (I guess) but I'm not sure how it is achieved here. Where are the multiples of itself get to 0 and how does the np.flatnonzero puts out the prime numbers because before that, they are just 1 and 0?

I hope you can understand and help me. :)


Solution

  • Let's go through it step-by-step.

    Eins = np.ones(n, dtype=bool)
    

    This creates a new array of size n, with type bool, and all ones. Because of the type, "one" means True. The array represents all our numbers that we want to test for primality, with True meaning the number is prime, False meaning it is not. So we start with all numbers marked as (potential) primes.

    Eins[0] = Eins[1] = False
    

    Now we set the 0th and 1st element to False: Neither 0 nor 1 are primes.

    for i in range(2, n):
    

    Next, we'll iterate over all remaining numbers (from 2 onwards). We could get away with only going up to the square root of n, but for simplicity, we go over all numbers.

        if Eins[i]:
    

    If the ith value of the array is True, that means i is a prime. The first time this condition will be entered is with i=2. Next, we want to remove all multiples of our number from the prime candidates:

            Eins[i*i::i] = False
    

    We can read this line as if it was Eins[2*i::i] = False, starting from i*i is just an optimization¹. If 2 is a prime number, that means that 2*2, 3*2, 4*2, ... aren't, so we set the multiples to False. The indexing notation stands for "from i*i until the end" (represented by the empty space between the colons) ", in steps of i". This statement produces the numbers i*i, i*(i+1), i*(i+2), ..., so all multiples of i that haven't been marked as "not a prime" yet.

    return np.flatnonzero(Eins)
    

    And this simply returns all indices where the value is True, i.e. all prime numbers that were found.


    1: A word regarding the i*i: We can start from the square of i, because any numbers j*i (for j < i) have already been marked as nonprime when we were at j.


    Here's a demonstration that this works, for n=15:

    We start with the array filled with .ones:

     0  1  2  3  4  5  6  7  8  9 10 11 12 13 14
    [T, T, T, T, T, T, T, T, T, T, T, T, T, T, T]
    

    Then we empty Eins[0] and Eins[1]:

     0  1  2  3  4  5  6  7  8  9 10 11 12 13 14
    [F, F, T, T, T, T, T, T, T, T, T, T, T, T, T]
    

    And now we start iterating over the range, starting with i=2, and we remove every multiple of 2 starting with 2*2=4:

     0  1  2  3  4  5  6  7  8  9 10 11 12 13 14
    [F, F, T, T, F, T, F, T, F, T, F, T, F, T, F]
    

    i=3, removing every multiple of 3 starting with 3*3=9:

     0  1  2  3  4  5  6  7  8  9 10 11 12 13 14
    [F, F, T, T, F, T, F, T, F, F, F, T, F, T, F]
    

    Note that we didn't have to remove 6, because it was already removed by i=2.

    When i=4, we skip the removal because Eins[i] is False. From i=5 onwards, nothing happens, because the squares (25, 36, ...) are larger than the array. Finally, we use flatnonzero and get all indices where the value is True:

     0  1  2  3  4  5  6  7  8  9 10 11 12 13 14
    [F, F, T, T, F, T, F, T, F, F, F, T, F, T, F]
           2  3     5     7          11    13