Search code examples
calgorithmrecursionprime-factoring

Not getting proper output from Pollard's rho algorithm implementation


I don't know where I am doing wrong in trying to calculate prime factorizations using Pollard's rho algorithm.

#include<stdio.h>
#define f(x)  x*x-1

int pollard( int );
int gcd( int, int);

int main( void ) {
    int n;
    scanf( "%d",&n );
    pollard( n );
    return 0;  
}

int pollard( int n ) {
    int i=1,x,y,k=2,d;
    x = rand()%n;
    y = x;

    while(1) {
        i++;
        x = f( x ) % n;
        d = gcd( y-x, n);

        if(d!=1 && d!=n)
            printf( "%d\n", d);

        if(i == k) {
            y = x;
            k = 2 * k;
        }
    }
}   
int gcd( int a, int b ) {

    if( b == 0) 
        return a;
    else 
        return gcd( b, a % b);
}

Solution

  • One immediate problem is, as Peter de Rivaz suspected the

    #define f(x)  x*x-1
    

    Thus the line

    x = f(x)%n;
    

    becomes

    x = x*x-1%n;
    

    and the precedence of % is higher than that of -, hence the expression is implicitly parenthesised as

    x = (x*x) - (1%n);
    

    which is equivalent to x = x*x - 1; (I assume n > 1, anyway it's x = x*x - constant;) and if you start with a value x >= 2, you have overflow before you had a realistic chance of finding a factor:

    2 -> 2*2-1 = 3 -> 3*3 - 1 = 8 -> 8*8 - 1 = 63 -> 3968 -> 15745023 -> overflow if int is 32 bits

    That doesn't immediately make it impossible that gcd(y-x,n) is a factor, though. It just makes it likely that at a stage where theoretically, you would have found a factor, the overflow destroys the common factor that mathematically would exist - more likely than a common factor introduced by overflow.

    Overflow of signed integers is undefined behaviour, so there are no guarantees how the programme behaves, but usually it behaves consistently so the iteration of f still produces a well-defined sequence for which the algorithm in principle works.

    Another problem is that y-x will frequently be negative, and then the computed gcd can also be negative - often -1. In that case, you print -1.

    And then, it is a not too rare occurrence that iterating f from a starting value doesn't detect a common factor because the cycles modulo both prime factors (for the example of n a product of two distinct primes) have equal length and are entered at the same time. You make no attempt at detecting such a case; whenever gcd(|y-x|, n) == n, any further work in that sequence is pointless, so you should break out of the loop when d == n.

    Also, you never check whether n is a prime, in which case trying to find a factor is a futile undertaking from the start.

    Furthermore, after fixing f(x) so that the % n applies to the complete result of f(x), you have the problem that x*x still overflows for relatively small x (with the standard signed 32-bit ints, for x >= 46341), so factoring larger n may fail due to overflow. At least, you should use unsigned long long for the computations, so that overflow is avoided for n < 2^32. However, factorising such small numbers is typically done more efficiently with trial division. Pollard's Rho method and other advanced factoring algorithms are meant for larger numbers, where trial division is no longer efficient or even feasible.