Why does printing once every 100k iterations ruin the numba performance?

Why does this code, with a print once every 100k iterations (i.e. only 40 lines are printed!) take 50 seconds to run:

import numpy as np
from numba import jit

@jit
def doit():
    A = np.random.random(4*1000*1000)
    n = 300
    Q = np.zeros(len(A)-n)
    for i in range(len(Q)):
        Q[i] = np.sum(A[i:i+n] <= A[i+n])
        if i % 100000 == 0:  # print the progress once every 100k iterations
            print("%i %.2f %% already done. " % (i, i * 100.0 / len(A)))

doit()

whereas, without the print, it only takes 2.4 seconds:

import numpy as np
from numba import jit
@jit
def doit():
    A = np.random.random(4*1000*1000)
    n = 300
    Q = np.zeros(len(A)-n)
    for i in range(len(Q)):
        Q[i] = np.sum(A[i:i+n] <= A[i+n])
doit()

Is this a general fact that print can really remove the benefit of numba?

Solution

If you try to compile it with @njit or @jit(nopython=True), you'll see that it's compiling in object mode from the exception. This version runs in about 1 sec on my machine with the print statement:

import numpy as np
from numba import jit

@jit(nopython=True)
def doit():
    A = np.random.random(4*1000*1000)
    n = 300
    Q = np.zeros(len(A)-n)
    for i in range(len(Q)):
        Q[i] = np.sum(A[i:i+n] <= A[i+n])
        if i % 100000 == 0:  # print the progress once every 100k iterations
            print(i , "(",  i * 100.0 / len(A), '% already done)')

In general if you are seeing poor performance from a numba function, it is because you are compiling in python object mode, so always putting nopython=True is a good practice unless you really want to use it in python object mode because it will fall back to that if it runs into some bit of syntax that the compiler can't compile down to machine code. Numba does do some loop lifting, but that's harder to reason about in terms of performance.

See:

http://numba.pydata.org/numba-doc/latest/user/5minguide.html#what-is-nopython-mode