pybind11 c++ unordered_map 10x slower than python dict?

I exposed a c++ unordered_map<string, int> to python, and it turned out this map is 10x slower than python's dict.

See code below.

// map.cpp file
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <pybind11/stl_bind.h>
#include <string>
#include <unordered_map>

namespace py = pybind11;

PYBIND11_MAKE_OPAQUE(std::unordered_map<std::string, int>);

PYBIND11_MODULE(map, m) {
    // map
    py::bind_map<std::unordered_map<std::string, int>>(m, "MapStr2Int");
}

On MacOS, compile it with this cmd:

c++ -O3 -std=c++14 -shared -fPIC -Wl,-undefined,dynamic_lookup $(python3 -m pybind11 --includes) map.cpp -o map$(python3-config --extension-suffix)

Finally, compare with python dict in ipython:

In [20]: import map

In [21]: c_dict = map.MapStr2Int()

In [22]: for i in range(100000):
    ...:     c_dict[str(i)] = i
    ...:

In [23]: py_dict = {w:i for w,i in c_dict.items()}

In [24]: arr = [str(i) for i in np.random.randint(0,100000, 100)]

In [25]: %timeit [c_dict[w] for w in arr]
59 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [26]: %timeit [py_dict[w] for w in arr]
6.58 µs ± 87.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

As seen, c_dict is much much slower than python version py_dict.

Why and how to improve?

Solution

You are comparing a native dictionary implementation (the one from the Python Standard Library) and a pybind wrapped one. I would bet a coin that a C++ program directly using std::unordered_map is certainly faster than the equivalent one, written in Python and using a dict.

But it is not what you are doing here. Instead of that, you ask pybind to generate a wrapper that will convert Python types into C++ ones, call the C++ standard library class methods and then convert back the result into a Python type. Those convertions are likely to require allocation and deallocation and will indeed take some time. Moreover, pybind is a very clever (hence complex) tool. You cannot expect it to generate code that would be as much optimized as direct calls using the Python API.

Unless you intend to use a specially optimized algorithm for the hashing function, you will not be able to write a C or C++ code that will be faster than the standard library, because the builtin types are already coded in C language. At most, you should be able to be as fast as the standard library if you mimic it and directly use the Python/C API.