Why does this code, with a print
once every 100k iterations (i.e. only 40 lines are printed!) take 50 seconds to run:
import numpy as np
from numba import jit
@jit
def doit():
A = np.random.random(4*1000*1000)
n = 300
Q = np.zeros(len(A)-n)
for i in range(len(Q)):
Q[i] = np.sum(A[i:i+n] <= A[i+n])
if i % 100000 == 0: # print the progress once every 100k iterations
print("%i %.2f %% already done. " % (i, i * 100.0 / len(A)))
doit()
whereas, without the print
, it only takes 2.4 seconds:
import numpy as np
from numba import jit
@jit
def doit():
A = np.random.random(4*1000*1000)
n = 300
Q = np.zeros(len(A)-n)
for i in range(len(Q)):
Q[i] = np.sum(A[i:i+n] <= A[i+n])
doit()
?
Is this a general fact that print
can really remove the benefit of numba
?
If you try to compile it with @njit
or @jit(nopython=True)
, you'll see that it's compiling in object mode from the exception. This version runs in about 1 sec on my machine with the print statement:
import numpy as np
from numba import jit
@jit(nopython=True)
def doit():
A = np.random.random(4*1000*1000)
n = 300
Q = np.zeros(len(A)-n)
for i in range(len(Q)):
Q[i] = np.sum(A[i:i+n] <= A[i+n])
if i % 100000 == 0: # print the progress once every 100k iterations
print(i , "(", i * 100.0 / len(A), '% already done)')
In general if you are seeing poor performance from a numba function, it is because you are compiling in python object mode, so always putting nopython=True
is a good practice unless you really want to use it in python object mode because it will fall back to that if it runs into some bit of syntax that the compiler can't compile down to machine code. Numba does do some loop lifting, but that's harder to reason about in terms of performance.
See:
http://numba.pydata.org/numba-doc/latest/user/5minguide.html#what-is-nopython-mode