Search code examples
pythonmpfrgmpy

Is MPFR division faster than native integer division?


I've always made the assumption that integer division was faster than floating point division, but I did some tests that seemed to prove otherwise.

import gmpy2, time, math

digits = 100000

scale = 10**digits  # Decimal precision
gmpy2.get_context().precision = int(math.log2(10) * digits)  # Binary precision

def start_timer():
    global start_time  
    start_time = time.time()

def print_timer():
    print("%s s" % (time.time() - start_time))

start_timer()
for i in range(1000):
    x = scale // 3
print_timer()

start_timer()
for i in range(1000):
    x = gmpy2.mpfr(1) / 3
print_timer()

start_timer()
for i in range(1000):
    x = gmpy2.mpfr(1) / gmpy2.mpfr(3)
print_timer()

The integer division took 0.17 seconds, the mpfr division took 0.06 seconds, and the division between two floating point numbers took 15.56 seconds.

My questions:

  1. Am I setting up this test correctly?
  2. Is mpfr division really more optimized than native division?
  3. Is division involving a floating point and an integer much faster than that involving two floating point numbers?

Solution

  • I'm using IPython to time some short examples and then I'll try to explain the results.

    from gmpy2 import mpfr, get_context
    get_context().precision=1000
    a=mpfr(1);b=mpfr(3)
    
    %timeit a/b
    1000000 loops, best of 3: 669 ns per loop
    %timeit a/3
    1000000 loops, best of 3: 464 ns per loop
    
    get_context().precision=10000
    a=mpfr(1);b=mpfr(3)
    
    %timeit a/b
    100000 loops, best of 3: 12.9 µs per loop
    %timeit a/3
    1000000 loops, best of 3: 1.33 µs per loop
    
    get_context().precision=100000
    a=mpfr(1);b=mpfr(3)
    
    %timeit a/b
    1000 loops, best of 3: 505 µs per loop
    %timeit a/3
    100000 loops, best of 3: 8.13 µs per loop
    

    Notice that as the precision increases, the running time for a/b increases more rapidly than a/3. When calculating a/b, MPFR uses the full precision of both values and the running time is (roughly) O(n * ln(n)). When calculating a/3, MPFR uses a short, but exact, representation of 3 and the running time is (roughly) O(n). This explains why a/b is slower than a/3 for high precision. (n is the length of a in bits.)

    When Python calculates scale//3, it takes advantage of the fact that 3 will fit into a single digit and running time is linear in the length of scale. This is effectively the same calculation as a/3 but since the underlying GMP library is faster than Python, a/3 is computed faster than scale//3.

    Here is a short example of the difference in performance between Python and GMP.

    from gmpy2 import mpz
    scale = 10**100000
    
    %timeit scale//3
    10000 loops, best of 3: 162 µs per loop
    
    scale = mpz(scale)
    
    %timeit scale//3
    100000 loops, best of 3: 19 µs per loop
    

    You were measuring the performance between an n by n division and an n by k division when you compared a/b and a/3. (n is the length of a in bits, and k is much, much smaller than n.) When you compared scale//3 and `a/3', you were comparing a simple, straight-forward division implementation with a highly-optimized implementation.

    Implementation note: In the current unstable development branch, a/3 calls mpfr_div_ui directly. This eliminates the creation of a temporary object by MPFR. This improves the performance as shown below.

    from gmpy2 import mpfr, get_context
    get_context().precision=1000
    a=mpfr(1);b=mpfr(3)
    
    %timeit a/b
    1000000 loops, best of 3: 593 ns per loop
    %timeit a/3
    1000000 loops, best of 3: 231 ns per loop
    
    get_context().precision=10000
    a=mpfr(1); b=mpfr(3)
    
    %timeit a/b
    100000 loops, best of 3: 12.7 µs per loop
    %timeit a/3
    1000000 loops, best of 3: 927 ns per loop
    
    get_context().precision=100000
    a=mpfr(1);b=mpfr(3)
    
    %timeit a/b
    1000 loops, best of 3: 505 µs per loop
    %timeit a/3
    100000 loops, best of 3: 6.77 µs per loop