Pollard Rho crashes on inputs that are not too big

I implemented the Pollard Rho algorithm given on wikipedia

x ← 2; y ← 2; d ← 1
while d = 1:
    x ← g(x)
    y ← g(g(y))
    d ← gcd(|x - y|, n)
if d = n: 
    return failure
else:
    return d

Large inputs give the error:

GNU MP: Cannot allocate memory (size=4294950944)

Here is my implementation

mpz_class factor(mpz_class num)
{
    mpz_class x(2), y(2), d(1);
    while(d == 1)
    {
        x = polynomial(x);
        y = polynomial(polynomial(y));
        mpz_class diff = x - y;
        if(diff < 0)
        {
            diff *= -1;
        }
        mpz_gcd(d.get_mpz_t(), diff.get_mpz_t(), num.get_mpz_t());
    }
    if(d == num)
    {
        return -1;//failure
    }
    else
    {
        return d;//found factor
    }
}

mpz_class polynomial (mpz_class x)
{
    return ((x * x) + 1);
}

It works on inputs like 121 but crashes on 5540987. Did I do something wrong? Is there a way this can be improved so such numbers can be factored? I've seen some implementations that appear to use the polynomial ((x*x)%N+c)%N (note the extra mod n). Does this work because any polynomial could be used?

Solution

Having two modulo operations there is a redundant, but having one of them fixes precisely this issue of the numbers exploding in size unless the algorithm terminates very early (which it does for 121).

Does this work because any polynomial could be used?

It's a bit more subtle, and throwing modulo operations into the mix is not really a case of "any polynomial". The key is that what the algorithm looks for is two values in some sequence x[i] and x[j] with i != j such that abs(x[i] - x[j]) is a multiple of p (where N = pq and neither p nor q are 1), or in other words, abs(x[i] - x[j]) mod p = 0 or x[i] ≡ x[j] mod p. At that point a cycle has been found in the sequence when viewed modulo p, and importantly if x[i] != x[j] then their difference is a non-zero multiple of p, which gives a chance extract a factor from N .. at least if their difference is not also a multiple of N (in that case the result from the GCD would be N itself and no factor comes out).

So looking purely at the math the modulo N step is theoretically unnecessary, the cycling modulo p happens without such "help". But it is possible, N = pq so if we reduce the sequence modulo N, then its properties modulo p are not disturbed and the algorithm still works. More than that, the reduction modulo N is very important practically because it stops the numbers involved from becoming impractically large, which would otherwise not only slow down the algorithm but also eventually fail on actual (finite size) hardware.

That's a lot of theoretical justification, the implementation is really simple,

mpz_class polynomial (mpz_class x, mpz_class num)
{
    return ((x * x) + 1) % num;
}