Is MPFR division faster than native integer division?

I've always made the assumption that integer division was faster than floating point division, but I did some tests that seemed to prove otherwise.

import gmpy2, time, math

digits = 100000

scale = 10**digits  # Decimal precision
gmpy2.get_context().precision = int(math.log2(10) * digits)  # Binary precision

def start_timer():
    global start_time  
    start_time = time.time()

def print_timer():
    print("%s s" % (time.time() - start_time))

start_timer()
for i in range(1000):
    x = scale // 3
print_timer()

start_timer()
for i in range(1000):
    x = gmpy2.mpfr(1) / 3
print_timer()

start_timer()
for i in range(1000):
    x = gmpy2.mpfr(1) / gmpy2.mpfr(3)
print_timer()

The integer division took 0.17 seconds, the mpfr division took 0.06 seconds, and the division between two floating point numbers took 15.56 seconds.

My questions:

Am I setting up this test correctly?
Is mpfr division really more optimized than native division?
Is division involving a floating point and an integer much faster than that involving two floating point numbers?

Solution

I'm using IPython to time some short examples and then I'll try to explain the results.

from gmpy2 import mpfr, get_context
get_context().precision=1000
a=mpfr(1);b=mpfr(3)

%timeit a/b
1000000 loops, best of 3: 669 ns per loop
%timeit a/3
1000000 loops, best of 3: 464 ns per loop

get_context().precision=10000
a=mpfr(1);b=mpfr(3)

%timeit a/b
100000 loops, best of 3: 12.9 µs per loop
%timeit a/3
1000000 loops, best of 3: 1.33 µs per loop

get_context().precision=100000
a=mpfr(1);b=mpfr(3)

%timeit a/b
1000 loops, best of 3: 505 µs per loop
%timeit a/3
100000 loops, best of 3: 8.13 µs per loop

Notice that as the precision increases, the running time for a/b increases more rapidly than a/3. When calculating a/b, MPFR uses the full precision of both values and the running time is (roughly) O(n * ln(n)). When calculating a/3, MPFR uses a short, but exact, representation of 3 and the running time is (roughly) O(n). This explains why a/b is slower than a/3 for high precision. (n is the length of a in bits.)

When Python calculates scale//3, it takes advantage of the fact that 3 will fit into a single digit and running time is linear in the length of scale. This is effectively the same calculation as a/3 but since the underlying GMP library is faster than Python, a/3 is computed faster than scale//3.

Here is a short example of the difference in performance between Python and GMP.

from gmpy2 import mpz
scale = 10**100000

%timeit scale//3
10000 loops, best of 3: 162 µs per loop

scale = mpz(scale)

%timeit scale//3
100000 loops, best of 3: 19 µs per loop

You were measuring the performance between an n by n division and an n by k division when you compared a/b and a/3. (n is the length of a in bits, and k is much, much smaller than n.) When you compared scale//3 and `a/3', you were comparing a simple, straight-forward division implementation with a highly-optimized implementation.

Implementation note: In the current unstable development branch, a/3 calls mpfr_div_ui directly. This eliminates the creation of a temporary object by MPFR. This improves the performance as shown below.

from gmpy2 import mpfr, get_context
get_context().precision=1000
a=mpfr(1);b=mpfr(3)

%timeit a/b
1000000 loops, best of 3: 593 ns per loop
%timeit a/3
1000000 loops, best of 3: 231 ns per loop

get_context().precision=10000
a=mpfr(1); b=mpfr(3)

%timeit a/b
100000 loops, best of 3: 12.7 µs per loop
%timeit a/3
1000000 loops, best of 3: 927 ns per loop

get_context().precision=100000
a=mpfr(1);b=mpfr(3)

%timeit a/b
1000 loops, best of 3: 505 µs per loop
%timeit a/3
100000 loops, best of 3: 6.77 µs per loop