I've always made the assumption that integer division was faster than floating point division, but I did some tests that seemed to prove otherwise.
import gmpy2, time, math
digits = 100000
scale = 10**digits # Decimal precision
gmpy2.get_context().precision = int(math.log2(10) * digits) # Binary precision
def start_timer():
global start_time
start_time = time.time()
def print_timer():
print("%s s" % (time.time() - start_time))
start_timer()
for i in range(1000):
x = scale // 3
print_timer()
start_timer()
for i in range(1000):
x = gmpy2.mpfr(1) / 3
print_timer()
start_timer()
for i in range(1000):
x = gmpy2.mpfr(1) / gmpy2.mpfr(3)
print_timer()
The integer division took 0.17 seconds, the mpfr division took 0.06 seconds, and the division between two floating point numbers took 15.56 seconds.
I'm using IPython to time some short examples and then I'll try to explain the results.
from gmpy2 import mpfr, get_context
get_context().precision=1000
a=mpfr(1);b=mpfr(3)
%timeit a/b
1000000 loops, best of 3: 669 ns per loop
%timeit a/3
1000000 loops, best of 3: 464 ns per loop
get_context().precision=10000
a=mpfr(1);b=mpfr(3)
%timeit a/b
100000 loops, best of 3: 12.9 µs per loop
%timeit a/3
1000000 loops, best of 3: 1.33 µs per loop
get_context().precision=100000
a=mpfr(1);b=mpfr(3)
%timeit a/b
1000 loops, best of 3: 505 µs per loop
%timeit a/3
100000 loops, best of 3: 8.13 µs per loop
Notice that as the precision increases, the running time for a/b
increases more rapidly than a/3
. When calculating a/b
, MPFR uses the full precision of both values and the running time is (roughly) O(n * ln(n)). When calculating a/3
, MPFR uses a short, but exact, representation of 3 and the running time is (roughly) O(n). This explains why a/b
is slower than a/3
for high precision. (n is the length of a
in bits.)
When Python calculates scale//3
, it takes advantage of the fact that 3 will fit into a single digit
and running time is linear in the length of scale
. This is effectively the same calculation as a/3
but since the underlying GMP library is faster than Python, a/3
is computed faster than scale//3
.
Here is a short example of the difference in performance between Python and GMP.
from gmpy2 import mpz
scale = 10**100000
%timeit scale//3
10000 loops, best of 3: 162 µs per loop
scale = mpz(scale)
%timeit scale//3
100000 loops, best of 3: 19 µs per loop
You were measuring the performance between an n
by n
division and an n
by k
division when you compared a/b
and a/3
. (n
is the length of a
in bits, and k
is much, much smaller than n
.) When you compared scale//3
and `a/3', you were comparing a simple, straight-forward division implementation with a highly-optimized implementation.
Implementation note: In the current unstable development branch, a/3
calls mpfr_div_ui
directly. This eliminates the creation of a temporary object by MPFR. This improves the performance as shown below.
from gmpy2 import mpfr, get_context
get_context().precision=1000
a=mpfr(1);b=mpfr(3)
%timeit a/b
1000000 loops, best of 3: 593 ns per loop
%timeit a/3
1000000 loops, best of 3: 231 ns per loop
get_context().precision=10000
a=mpfr(1); b=mpfr(3)
%timeit a/b
100000 loops, best of 3: 12.7 µs per loop
%timeit a/3
1000000 loops, best of 3: 927 ns per loop
get_context().precision=100000
a=mpfr(1);b=mpfr(3)
%timeit a/b
1000 loops, best of 3: 505 µs per loop
%timeit a/3
100000 loops, best of 3: 6.77 µs per loop