why is numba dict lookup so much slower than cpython?

I ran the following simple experiment:

import numba
import numpy as np
@njit
    def foo():
        d = dict()
        d[(1.0, 2)] = np.random.rand(500)
        return d
e = dict()
e[(1.0, 2)] = np.random.rand(500)
d = foo()

e and d are both dictionaries with one key each.

In [42]: %timeit e[(1.0, 2)][3]
123 ns ± 2.86 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [43]: %timeit d[(1.0, 2)][3]
1.99 µs ± 50.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Why is the numba dict 16 times slower?

This is with numba version 0.56.4.

Solution

It's probably not the dictionary lookup that's slow, but the switching context from Python->Numba->Python that adds a significant overhead (for such a simple function).

Python dictionaries are also very fast, and well optimized. So as long as you stay within Python, that's hard to beat to begin with.

It's going to really depend on the actual use case whether something like that matters. Here's an alternative test, doing 1000 lookups in a function:

from numba import njit
import numpy as np

@njit
def create_numba_dict():
    return {(i,i): i for i in range(1000)} 

@njit
def index_numba_dict(d):
    
    total = 0.0
    for i in range(1000):
        total += d[(i,i)]

    return total

If I benchmark this with a warmup (to eliminate compilation overhead), these are the results I get:

nb_dict = create_numba_dict()
index_numba_dict(nb_dict)
%timeit index_numba_dict(nb_dict)

py_dict = {(i,i): i for i in range(1000)}
%timeit sum([py_dict[(i,i)] for i in range(1000)])

There's of course a little more going on then only the lookup, in my experience you have to do something with the result, otherwise the compiler is sometimes clever enough to never even execute it. And it still isn't a very realistic scenario.

But I think it does show, that in the context of Numba, it's not the lookup that's intrinsically slow. Otherwise that should keep slowing down the total duration, regardless of the amount of lookups performed.