Search code examples
pythonmathprimesprime-factoring

Explain a code to check primality based on Fermat's little theorem


I found some Python code that claims checking primality based on Fermat's little theorem:

def CheckIfProbablyPrime(x):
    return (2 << x - 2) % x == 1

My questions:

  1. How does it work?
  2. What's its relation to Fermat's little theorem?
  3. How accurate is this method?
  4. If it's not accurate, what's the advantage of using it?

I found it here.


Solution

  • 1. How does it work?

    Fermat's little theorem says that if a number x is prime, then for any integer a:

    Fermat's Little Theorem Part 1

    If we divide both sides by a, then we can re-write the equation as follows:

    Fermat's Little Theorem Part 2

    I'm going to punt on proving how this works (your first question) because there are many good proofs (better than I can provide) on this wiki page and under some Google searches.

    2. Relation between code and theorem

    So, the function you posted checks if (2 << x - 2) % x == 1.

    First off, (2 << x-2) is the same thing as writing 2**(x-1), or in math-form:

    2**(x-1)

    That's because << is the logical left-shift operator, which is explained better here. The relation between bit-shifting and multiplying by powers of 2 is specific to the way that numbers are represented on computers (in binary), but it all boils down to

    2 << (x-1) == 2**(x)

    I can subtract 1 from the exponent on both sides, which gives

    2 << (x-2) == 2**(x-1)


    Now, we know from above that for any number a,

    a**(x-1) == 1 (mod x)

    Let's say then that a = 2. That gives us

    2**(x-1) == 1 (mod x)

    Well heck, that's the same as 2 << (x-2)! So then we can write:

    Almost final relation

    Which leads to the final relation:

    Final Relation


    Now, the math version of mod looks kind of odd, but we can write the equivalent code as follows:

    (2 << x - 2) % x == 1

    And that's the relation.

    3. Accuracy of method

    So, I think "accuracy" is a bad term here, because Fermat's little theorem is definitely true for all prime numbers. However, that does not mean that it's true or false for all numbers -- which is to say, if I have some number i, and I'm not sure if i is prime, using Fermat's Little Relation will only tell me if it is definitely NOT prime. If Fermat's Little Relation is true, then i could not be prime. These kinds of numbers are called pseudoprime numbers, or more specifically in this case Fermat Pseudoprime numbers.

    If this sort of thing sounds interesting, take a look at the Carmichael numbers AKA the Absolute Fermat Pseudoprimes, which pass the Fermat test in any base but are not prime. In our case we run into numbers which pass in base 2, but Fermat's little theorem might not hold for these numbers in other bases -- the Carmichael numbers pass the test for all bases coprime to x.

    On the wiki page of the Carmichael there is a discussion of their distribution over the range of natural numbers -- they appear exponentially with the size of the range over which you're looking, though the exponent is less than 1 (about 1/3). So, if you're searching for primes over a big range, you're going to run into exponentially more Carmichael numbers, which are effectively false positives for this method CheckIfProbablyPrime. That might be okay, depending on your input and how much you care about running into false positives.

    4. Why is this useful?

    In short, it's an optimization.

    The main reason to use something like this is to speed up a search for prime numbers. That's because actually checking if a number is prime is expensive -- i.e. more than O(1) running time. Doable, but still more expensive than O(1) time. So, if we can avoid doing that actual check for some numbers, we'll be able to devote more time to checking actual candidates. Since Fermat's little relation will only say yes if a number is possibly prime (it will never say no if the number is prime), and it can be checked in O(1) time, we can toss it into an is_prime loop to ignore a fair amount of numbers. So, we can speed things up.

    There are many primality checks like this one, you can find some coded prime checkers here


    Final Note

    One of the confusing things about this optimization is that it uses the bit shift operator << instead of the exponentiation operator **. This is because bit shifting is one of the fastest operations that your computer can do, while exponentiation is slower by some amount. It is not always the best optimization in many cases, because most modern languages know how to replace things we write with more optimized operations. But, that's my venture as to why the authors of this code used the bit shift instead of 2**(x-1).


    Edit: As MarkDickinson notes, taking the exponent of a number and then modding it explicitly is not the best way to do it. This is a thing called modular exponentiation, and there exist algorithms which can do it faster than the way we've written it. Python's builtin pow actually implements one of these algorithms, and takes an optional third argument to mod by. So we can write a final version of this function:

    def CheckIfProbablyPrime(x):
        return pow(2, x-1, x) == 1
    

    Which is not only more readable but also faster than the confusing bit-shift crap. You know what they say.