Why my pandas + numba code works worse than pandas + pure python code?

In the code below I am trying to apply the function to each cell of the DataFrame. Runtime measurements show that Numba code is 6-7-fold slower compared to pure python when size of a matrix is 1000x1000, and 2-3 fold slower when it's 10 000x10 000. I've also run the code several times to ensure that compilation time does not affect the overall runtime. What am I missing?

import time
from numba import jit
import pandas as pd

vcf = pd.DataFrame(np.full(shape=(10_000,10_000), fill_value='./.'))

time1 = time.perf_counter()
@jit(cache=True)
def jit_func(x):
    if x == './.':
        return 1
    else:
        return 0
vcf.applymap(jit_func)
print('JIT', time.perf_counter() - time1)

time1 = time.perf_counter()
vcf.applymap(lambda x: 1 if x=='./.' else 0)
print('LAMBDA', time.perf_counter() - time1)

time1 = time.perf_counter()
def python_func(x):
    if x == './.':
        return 1
    else:
        return 0
vcf.applymap(python_func)
print('PYTHON', time.perf_counter() - time1)

Output:

JIT 464.7864613599959
LAMBDA 158.36754451994784
PYTHON 122.22150028299075

Solution

Calling a Numba function from Python has a higher overhead than pure-Python functions. This is because Numba need to check the function has been compiled and call a wrapping conversion function. The conversion function is meant to convert pure-Python types to native ones so that Numba does not operate on pure-Python types (bound to the GIL) and so it can produce a more efficient code. This wrapping function is pretty fast for simple types like integers. However, the wrapping function is currently very expensive for strings. In fact, most string operations in Numba are currently very slow. AFAIK, there is no plan to make them fast soon (Numba is meant to be used for numerical computations, not string ones). Besides, string-based function are also significantly slower to compile.

The key point is to use bytes instead. This requires conversions (e.g. using encode) and bytes can only be (safely) used for ASCII character (i.e. no unicode). That being said, not that unicode operation are generally slow and CPython is not so bad for that since it has been pretty well optimised to compute unicode strings efficiently.

Here are some benchmarks to support the above explanation:

import numba as nb

# Eagerly compile the function so it is compiled before being called
@njit('void()')
def fn_nb():
    pass

def fn_python():
    pass

%timeit fn_nb()      # 56.5 ns ± 0.573 ns/loop
%timeit fn_python()  # 50.7 ns ± 0.386 ns/loop => faster than fn_python

# ----------------------------------------------------------------------------

@njit('void(int64)')
def fn_nb_with_int_param(useless_param):
    pass

%timeit fn_nb_with_int_param(123)  # 118 ns ± 0.525 ns/loop => parameters add overhead

# ----------------------------------------------------------------------------

@njit('void(int64[:])')
def fn_nb_with_arr_param(param):
    pass

arr = np.array([], dtype=np.int64)
%timeit fn_nb_with_arr_param(arr)  # 242 ns ± 2.01 ns/loop => arrays are more expensive

# ----------------------------------------------------------------------------

@njit('void(unicode_type)')
def fn_nb_with_str_param(param):
    pass

s = ''
%timeit fn_nb_with_str_param(s)    # 1.79 µs ± 7.51 ns/loop => MUCH slower with strings

# ----------------------------------------------------------------------------

@njit('int64(unicode_type)')
def fn_nb_with_str_and_body(param):
    if param == './.':
        return 1
    else:
        return 0

s = './123'
%timeit fn_nb_with_str_and_body(s) # 1.84 µs ± 11.2 ns/loop => just a bit slower

# ----------------------------------------------------------------------------

@njit # I do not know the string signature for this one
def fn_nb_with_bytes_params(param):
    pass

s = b''
fn_nb_with_bytes_params(s) # Force Numba to compile the function
%timeit fn_nb_with_bytes_params(s)  # 255 ns ± 2.23 ns/loop => much faster than strings

# ----------------------------------------------------------------------------

@njit
def fn_nb_with_bytes_and_body(param):
    if param == './.':
        return 1
    else:
        return 0

s = b'./123'
fn_nb_with_bytes_and_body(s)
%timeit fn_nb_with_bytes_and_body(s)  # 259 ns ± 3.42 ns/loop => still fast!

Performance wise, it is often better to avoid strings like the plague (especially unicode) if possible. There are few tricks to do that. One way is to convert string columns having many equal string to categorial columns (they are stored as integers internally + a int/string table for the labels).

If you really need to compute unicode strings, then Cython is certainly better suited for that.

Finally, calling a Numba function 100_000_000 times is not efficient. In fact, it is inefficient even in native languages like C/C++ (unless the function is inlined). It is better to get the data of a given column and call a native/compiled/Python function once. The target function can iterate over the items of the provided column. Pandas currently stores string columns inefficiently. Pandas developers plan to improve this in the future, but for now, we have to pay the cost of a conversion (string list to a native type), or the cost of directly operating over CPython objects (not much faster than using a pure-Python function).