Search code examples
pythonmathwhile-loopmathematical-optimizationpython-3.4

What is the logic of this process


I was working on a problem in Project Euler; and I found a question in SO. The question and accepted answer says;

n = 600851475143 
i = 2

while i * i < n:
    while n%i == 0:
        n = n / i
    i = i + 1

print (n)

It's just awesome. I still can't understand how this process is so fast and can find the largest prime factor of 600billion in 0.00001 seconds. I tried tons of methods and codes for that, processes took over than 1 hour..

Could anyone explain me the logic of this codes and why it's super-fast? is while loop have a special place in Python?


Solution

  • The fundamental theorem of arithmetic states that every integer greater than 1 can be represented as the product of prime numbers. E.g., the number 2100 can be represented like so:

    2 x 2 x 3 x 5 x 5 x 7

    I've arranged it so the largest prime factor is on the right, in this case 7. What this algorithm is doing is starting from 2 and dividing n (i.e. "removing" that factor) until there are no more to remove (the modulo 0 step checks that it is divisible cleanly before dividing.)

    So, following the code, we would have i = 2 and n = 2100, or

    2 x 2 x 3 x 5 x 5 x 7

    2100 is divisible by 2 (2100 % 2 == 0) and also because we see a 2 in the factorization above. So divide it by 2 and we get 1050, or

    2 x 3 x 5 x 5 x 7

    Continue dividing by 2, once more, and you get a number that is no longer divisible by 2, that is 525, or

    3 x 5 x 5 x 7

    Then we increase i to 3 and continue. See how by the end we will be left with the highest prime factor?

    The reason for the first while loop's i * i < n (which really should be i * i <= n) is because

    if one divisor or factor of a number (other than a perfect square) is greater than its square root, then the other factor will be less than its square root. Hence all multiples of primes greater than the square root of n need not be considered.

    from: http://britton.disted.camosun.bc.ca/jberatosthenes.htm

    So if i is greater than the square root of n, that means all remaining factors would have had a "pair" that we already found, below the square root of n. The check used, i * i <= n is equivalent but faster than doing a square root calculation.

    The reason this is so quick and other brute force methods are so slow is because this is dividing the number down in each step, which exponentially cuts the number of steps that need to be done.

    To see this, the prime factorization of 600851475143 is

    71 x 839 x 1471 x 6857

    and if you modify the code to read:

    n = 600851475143
    i = 2
    
    while i * i <= n:
        while n%i == 0:
            print "Dividing by %d" % i
            n = n / i
        i = i + 1
    
    if n > 1:
        print n
    

    You'll see:

    >>> 
    Dividing by 71
    Dividing by 839
    Dividing by 1471
    6857
    

    which shows you that this is exactly how it's working.