It's a well known fact that multiplication, integer division, and modulo by powers of two can be rewritten more efficiently as bitwise operations:
>>> x = randint(50000, 100000)
>>> x << 2 == x * 4
True
>>> x >> 2 == x // 4
True
>>> x & 3 == x % 4
True
In compiled languages such as C/C++ and Java, tests have shown that bitwise operations are generally faster than arithmetic operations. (See here and here). However, when I test these in Python, I am getting contrary results:
In [1]: from random import randint
...: nums = [randint(0, 1000000) for _ in range(100000)]
In [2]: %timeit [i * 8 for i in nums]
7.73 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [3]: %timeit [i << 3 for i in nums]
8.22 ms ± 368 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [4]: %timeit [i // 8 for i in nums]
7.05 ms ± 393 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [5]: %timeit [i >> 3 for i in nums]
7.55 ms ± 367 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [6]: %timeit [i % 8 for i in nums]
5.96 ms ± 503 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [7]: %timeit [i & 7 for i in nums]
8.29 ms ± 816 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
As you can see, bitwise operations are slower than their arithmetic counterparts, especially for modulo. I repeated this test for another set of numbers, and got the same result. Is there a reason for this? These tests were in CPython 3.6.7, if that matters.
*
, %
, and /
all have fast paths for single-"limb" integers. <<
, >>
, and &
don't. They're going through the general-purpose arbitrary-precision code path.