Search code examples
pythonc++multiprocessingpybind11

How to create a python module in C++ that multiprocessing does not support


I am trying and failing to reproduce and understand a problem I saw where multiprocessing failed when using a python module written in C++. My understanding was that the problem is that multiprocessing needs to pickle the function it is using. So I made my_module.cpp as follows:

#include <pybind11/pybind11.h>

int add(int input_number) {
    return input_number + 10;
}

PYBIND11_MODULE(my_module, m) {
    m.doc() = "A simple module implemented in C++ to add 10 to a number.";
    m.def("add", &add, "Add 10 to a number");
}

After

pip install pybind11

I compiled with:

c++ -O3 -Wall -shared -std=c++11 -fPIC $(python3 -m pybind11 --includes) my_module.cpp -o my_module$(python3-config --extension-suffix)

I can import my_module and it works as expected.

I can test if it can be pickled with:

import my_module
import pickle

# Use the add function
print(my_module.add(5))  # Outputs: 15

# Attempt to pickle the module
try:
    pickle.dumps(my_module)
except TypeError as e:
    print(f"Pickling error: {e}")  # Expected error

which outputs Pickling error: cannot pickle 'module' object as expected.

Now I tested multiprocessing and was surprising that it worked. I was expecting it to give a pickling error.

import my_module
from multiprocessing import Pool

# A wrapper function to call the C++ add function
def parallel_add(number):
    return my_module.add(number)

if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]

    try:
        # Create a pool of worker processes
        with Pool(processes=2) as pool:
            results = pool.map(parallel_add, numbers)
        print(results)  # If successful, prints the results
    except Exception as e:
        print(f"Multiprocessing error: {e}")

How can I make a Python module in C++ with pybind11 which fails with multiprocessing because of a pickling error?

I am using Linux


Solution

  • I don't think your code tries to pickle a module as-is? If you redefine parallel_add to take a module as an argument, then use a partial to pass my_module into it, you can force Python to do that.

    import my_module
    from functools import partial
    from multiprocessing import Pool
    
    # Same wrapper, but now takes a module as an argument
    def parallel_add(module, number):
        return module.add(number)
    
    if __name__ == "__main__":
        numbers = [1, 2, 3, 4, 5]
    
        with Pool(processes=2) as pool:
            results = pool.map(partial(parallel_add, my_module), numbers)
        print(results)
    

    This throws the error you were expecting:

    Traceback (most recent call last):
      File "<frozen runpy>", line 198, in _run_module_as_main
      File "<frozen runpy>", line 88, in _run_code
      File "/home/anerdw/stackoverflow/unpickling.py", line 13, in <module>
        results = pool.map(partial(parallel_add, my_module), numbers)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib/python3.11/multiprocessing/pool.py", line 367, in map
        return self._map_async(func, iterable, mapstar, chunksize).get()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib/python3.11/multiprocessing/pool.py", line 774, in get
        raise self._value
      File "/usr/lib/python3.11/multiprocessing/pool.py", line 540, in _handle_tasks
        put(task)
      File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
        self._send_bytes(_ForkingPickler.dumps(obj))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
        cls(buf, protocol).dump(obj)
    TypeError: cannot pickle 'module' object
    

    You can also get a pickling-related multiprocessing error much more quickly by cutting out the wrapper and trying to pickle the function directly.

    import my_module
    from multiprocessing import Pool
    
    if __name__ == "__main__":
        numbers = [1, 2, 3, 4, 5]
    
        with Pool(processes=2) as pool:
            results = pool.map(my_module.add, numbers)
        print(results)
    
    Traceback (most recent call last):
      File "<frozen runpy>", line 198, in _run_module_as_main
      File "<frozen runpy>", line 88, in _run_code
      File "/home/anerdw/stackoverflow/unpickling.py", line 8, in <module>
        results = pool.map(my_module.add, numbers)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib/python3.11/multiprocessing/pool.py", line 367, in map
        return self._map_async(func, iterable, mapstar, chunksize).get()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib/python3.11/multiprocessing/pool.py", line 774, in get
        raise self._value
      File "/usr/lib/python3.11/multiprocessing/pool.py", line 540, in _handle_tasks
        put(task)
      File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
        self._send_bytes(_ForkingPickler.dumps(obj))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
        cls(buf, protocol).dump(obj)
    TypeError: cannot pickle 'PyCapsule' object