I am trying and failing to reproduce and understand a problem I saw where multiprocessing failed when using a python module written in C++. My understanding was that the problem is that multiprocessing needs to pickle the function it is using. So I made my_module.cpp
as follows:
#include <pybind11/pybind11.h>
int add(int input_number) {
return input_number + 10;
}
PYBIND11_MODULE(my_module, m) {
m.doc() = "A simple module implemented in C++ to add 10 to a number.";
m.def("add", &add, "Add 10 to a number");
}
After
pip install pybind11
I compiled with:
c++ -O3 -Wall -shared -std=c++11 -fPIC $(python3 -m pybind11 --includes) my_module.cpp -o my_module$(python3-config --extension-suffix)
I can import my_module
and it works as expected.
I can test if it can be pickled with:
import my_module
import pickle
# Use the add function
print(my_module.add(5)) # Outputs: 15
# Attempt to pickle the module
try:
pickle.dumps(my_module)
except TypeError as e:
print(f"Pickling error: {e}") # Expected error
which outputs Pickling error: cannot pickle 'module' object
as expected.
Now I tested multiprocessing and was surprising that it worked. I was expecting it to give a pickling error.
import my_module
from multiprocessing import Pool
# A wrapper function to call the C++ add function
def parallel_add(number):
return my_module.add(number)
if __name__ == "__main__":
numbers = [1, 2, 3, 4, 5]
try:
# Create a pool of worker processes
with Pool(processes=2) as pool:
results = pool.map(parallel_add, numbers)
print(results) # If successful, prints the results
except Exception as e:
print(f"Multiprocessing error: {e}")
How can I make a Python module in C++ with pybind11 which fails with multiprocessing because of a pickling error?
I am using Linux
I don't think your code tries to pickle a module as-is? If you redefine parallel_add
to take a module as an argument, then use a partial to pass my_module
into it, you can force Python to do that.
import my_module
from functools import partial
from multiprocessing import Pool
# Same wrapper, but now takes a module as an argument
def parallel_add(module, number):
return module.add(number)
if __name__ == "__main__":
numbers = [1, 2, 3, 4, 5]
with Pool(processes=2) as pool:
results = pool.map(partial(parallel_add, my_module), numbers)
print(results)
This throws the error you were expecting:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/anerdw/stackoverflow/unpickling.py", line 13, in <module>
results = pool.map(partial(parallel_add, my_module), numbers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/pool.py", line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/pool.py", line 774, in get
raise self._value
File "/usr/lib/python3.11/multiprocessing/pool.py", line 540, in _handle_tasks
put(task)
File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
self._send_bytes(_ForkingPickler.dumps(obj))
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot pickle 'module' object
You can also get a pickling-related multiprocessing error much more quickly by cutting out the wrapper and trying to pickle the function directly.
import my_module
from multiprocessing import Pool
if __name__ == "__main__":
numbers = [1, 2, 3, 4, 5]
with Pool(processes=2) as pool:
results = pool.map(my_module.add, numbers)
print(results)
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/anerdw/stackoverflow/unpickling.py", line 8, in <module>
results = pool.map(my_module.add, numbers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/pool.py", line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/pool.py", line 774, in get
raise self._value
File "/usr/lib/python3.11/multiprocessing/pool.py", line 540, in _handle_tasks
put(task)
File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
self._send_bytes(_ForkingPickler.dumps(obj))
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot pickle 'PyCapsule' object