Search code examples
calgorithmperformanceprimes

needed explanation about implementing the 6n-+1 algorithm for primes?


i was exploring primes algorithm in c, till i stumbled upon this one, and oh dear, the algorithm is really fast, knowing that its implementing the 6n-+1 property of prime after 2 to find the next nearest prime, but there's still some parts of the code i don't get, first why given that he starts from 2,3, and returns true if they appear respectively , what the use of the next condition if (nb % 2 == 0 || nb % 3 == 0) return FALSE; ?, every other prime is not divisible by 2 and 3 other than 2 and 3 and supposedly the while do the checking so what's the use of that if condition ? second the while loop i was talking about is supposedly should implement 6n-+1 property but i really don't see the relativity between the math expression and the test expression in the loop? and lastly the two condition inside the while is seems to be lovely and mathematical but i don't get the use of them that much, can you clarify please. thanks a lot in advance .

#include <stdio.h>
#define TRUE 1
#define FALSE 0
int is_prime(int nb)
{
    int divisor;

    if (nb == 2 || nb == 3)
        return TRUE;
    if (nb % 2 == 0 || nb % 3 == 0)
        return FALSE;
    divisor = 6;
    while (divisor * divisor - 2 * divisor + 1 <= nb)
    {
        if (nb % (divisor - 1) == 0)
            return FALSE;
        if (nb % (divisor + 1) == 0)
            return FALSE;
        divisor += 6;
    }
    return (1);
}

int find_next_prime(int nb)
{
    while (!is_prime(++nb))
    {};
    return nb;
}
int main()
{
    printf("%i", find_next_prime(78314683));
    return (0);
}

Solution

  • Let's start with some theory.

    First, n % prime == (n % (prime * anything)) % prime.

    This implies that these are all correct:

    n % prime1 == (n % (prime1 * prime2)) % prime1
    n % prime2 == (n % (prime1 * prime2)) % prime2
    
    temp = n % (prime1 * prime2)
    n % prime1 == temp % prime1
    n % prime2 == temp % prime2
    

    Let's choose the prime numbers 2 and 3:

    temp = n % 6
    n % 2 == temp % 2
    n % 3 == temp % 3
    

    There are only 6 possible values for temp:

    0, divisible by both 2 and 3
    1, not divisible by either 2 or 3
    2, divisible by 2
    3, divisible by 3
    4, divisible by 2
    5, not divisible by either 2 or 3
    

    This is the "6n-+1 property". Note that 2 is divisible by 2, and 3 is divisible by 3, so for those 2 cases we need an extra check to determine if "divisible by 2 or 3" actually means "is the prime number 2 or 3" (or if it means "can't be a prime number").

    But... why stop there?

    With 3 prime numbers it'd trivial to expand:

    temp = n % (prime1 * prime2 * prime3)
    n % prime1 == temp % prime1
    n % prime2 == temp % prime2
    n % prime3 == temp % prime3
    

    Let's choose the prime numbers 2, 3 and 5:

    temp = n % 30
    n % 2 == temp % 2
    n % 3 == temp % 3
    n % 5 == temp % 5
    

    Now there's 30 possible values for temp:

    0, divisible by 2, 3 and 5
    1, not divisible by either 2, 3 or 5
    2, divisible by 2
    3, divisible by 3
    4, divisible by 2
    5, divisible by 5
    6, divisible by both 2 and 3
    7, not divisible by either 2, 3 or 5
    8, divisible by 2
    9, divisible by 3
    10, divisible by 2 and 5
    11, not divisible by either 2, 3 or 5
    12, divisible by both 2 and 3
    13, not divisible by either 2, 3 or 5
    14, divisible by 2
    15, divisible by 3 and 5
    16, divisible by 2
    17, not divisible by either 2, 3 or 5
    18, divisible by both 2 and 3
    19, not divisible by either 2, 3 or 5
    20, divisible by 2 and 5
    21, divisible by 3
    22, divisible by 2
    23, not divisible by either 2, 3 or 5
    24, divisible by both 2, 3 and 5
    25, divisible by 5
    26, divisible by 2
    27, divisible by 3
    28, divisible by 2
    29, not divisible by either 2, 3 or 5
    

    We can pack 30 possibilities into 30 bits and get the constant 0x208A2882; then use if( (0x208A2882ULL & (1 << n%30)) != 0) /* Not divisible by 2, 3 or 5 */.

    But... why stop there?

    If you chose 2, 3 and 7 you'd need 42 bits (for 42 possibilities of n%(2*3*7)) and could use a uint64_t constant; and if you chose 5 and 11 you'd need 55 bits and could use a different uint64_t constant; so with 2 constants you could determine "divisible by 2, 3, 5, 7 or 11".

    But... why stop there?

    There's no reason you couldn't choose 2, 3, 5, 7, 11 and 13; and do temp = n % (2*3*5*7*11*13) and have a large (30030 bit) bitfield instead a single integer constant.

    But... why stop there?

    For a lot of cases (your find_next_prime(int nb)) a single "is/isn't divisible" flag is nice, but an "offset to next integer that isn't divisible" would be a whole lot better because you could completely skip lots of numbers at once. You'd be able to do something like:

    int find_next_prime(int nb)
    {
        do {
            if(nb > 13) {
                nb += table[nb % 30030];
                /* nb is not divisible by 2, 3, 5, 7, 11 or 13 */