Search code examples
pythonfactorization

Speeding up my Fermat Factorization function (Python)


My task is to factor very large composite numbers using Fermat's factorization method. The numbers are 1024 bits large, which is around 309 decimal digits.

I have come up with the Python code below, which uses the gmpy2 module for accuracy. It is simply a Python implementation of the pseudo-code shown on the Wikipedia page. I read the "Sieve Improvement" section on that page, but wasn't sure how to implement it.

def fermat_factor(n):
    assert n % 2 != 0  # Odd integers only

    a = gmpy2.ceil(gmpy2.sqrt(n))
    b2 = gmpy2.square(a) - n
    while not is_square(b2):
        a += 1
        b2 = gmpy2.square(a) - n

    factor1 = a + gmpy2.sqrt(b2)
    factor2 = a - gmpy2.sqrt(b2)
    return int(factor1), int(factor2) 

def is_square(n):
    root = gmpy2.sqrt(n)
    return root % 1 == 0  # '4.0' will pass, '4.1212' won't

This code runs fairly fast for small numbers, but takes much too long for numbers as large as those given in the problem. How can I improve the speed of this code? I'm not looking for people to write my code for me, but would appreciate some suggestions. Thank you for any responses.


Solution

  • Consider rewriting this script to use only integers instead of arbitrary precision floats.

    gmpy has support for integer square root (returns the floor of the square root, calculated efficiently). This can be used for the is_square() function by testing if the square of the square root equals the original.

    I'm not sure about gmpy2, but in gmpy.sqrt() requires an integer argument, and returns an integer output. If you are using floats, then that is probably your problem (since floats are very slow as compared to integers, especially when using extended precision). If you are in fact using integers, then is_square() must be doing a tedious conversion from integer to float every time it is called (and gmpy2.sqrt() != gmpy.sqrt()).

    For those of you who keep saying that this is a difficult problem, keep in mind that using this method was a hint: The fermat factorization algorithm is based on a weakness present when the composite number to be factored has two prime factors which are close to each other. If this was given as a hint, it is likely that the entity posing the problem knows this to be the case.

    Edit: Apparently, gmpy2.isqrt() is the same as gmpy.sqrt() (the integer version of sqrt), and gmpy2.sqrt() is the floating-point version.