I have the following setup:
import numpy as np
import matplotlib.pyplot as plt
import timeit
import numba
@numba.jit(nopython=True, cache=True)
def f(x):
summ = 0
for i in x:
summ += i
return summ
@numba.jit(nopython=True)
def g21(N, locs):
rvs = np.random.normal(loc=locs, scale=locs, size=N)
res = f(rvs)
return res
@numba.jit(nopython=False)
def g22(N, locs):
rvs = np.random.normal(loc=locs, scale=locs, size=N)
res = f(rvs)
return res
g22
and g21
are the exact same function, just that one of them has nopython=True
and the other nopython=False
Now I give them an input. If locs
is a scalar, then the numba should be able to compile everything since they support numpy.random.normal()
with this signature. However if locs
is an array, numba does not support this signature and should go back to the python interpreter.
I run this first just to compile the functions
N = 10_000
g22(N, 3)
g22(N, np.linspace(0,1,N))
g21(N, 3)
# g21(N, np.linspace(0,1,N)) # returns an error
Now I run a speed comparison
%timeit g21(N, 3)
%timeit g22(N, 3)
%timeit g22(N, np.linspace(0,1,N))
which returns
274 µs ± 3.43 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
270 µs ± 5.38 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
421 µs ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
It makes sense that g22(N, np.linspace(0,1,N)
is slowest since it goes back to the python interpreter.
However what I dont understand is that g21(N, 3)
is roughly the same speed as g22(N, 3)
, even though one has nopython=True
and the other not.
But g22(N,3)
has the big advantage that it can take another argument, namely g22(N, np.linspace(0,1,N))
, so its more versatile, however at the same time there is no speed penalty to having nopython=False
in this case, what is the use of using nopython=True
, if a function with nopython=False
achieves same speed?
in which specific case is nopython=True
better than nopython=False
?
- in this case, what is the use of using nopython=True, if a function with nopython=False achieves same speed?
- in which specific case is nopython=True better than nopython=False?
The documentation states:
Numba has two compilation modes: nopython mode and object mode. The former produces much faster code, but has limitations that can force Numba to fall back to the latter. To prevent Numba from falling back, and instead raise an error, pass
nopython=True
.
Note that in Numba will try to compile the code to a native binary in both modes. However, nopython
produces an error when this is not possible while the other produces a warning and cause a fallback code to be used.
For some applications, performance can be critical and so you really do not want the fallback code to be called. This the case for high-performance applications for example. Having an error in this case is better than having a code which runs for days instead of few minutes on an expensive machine (like a supercomputer or a computing server). Using different version of Numba can silently cause a fallback on some machine due to feature not being supported. I personally always use the nopython
mode to prevent such case (as the fallback code is generally too slow to be useful) and I consider the object mode a bit useless. Put is shortly, nopython
offers stronger guarantees about performance.