I am currently working on a Mathematica-like library on Python called Mathics, and actually I want to make my program parallelizable with the multiprocessing module.
But I have a problem, to make mathematical evaluation with Mathics, I have to open a Mathics session, by making an instance of the class MathicsSession(). Making this instance take a lot of time (approximately 5 seconds), and this instance can't be pickled by the multiprocessing module, which means I have to instance my sessions each time I call my function in the pool map, which make my code with multiprocessing longer than my previous code. That's a problem because the main goal of my project is to be highly parallelizable...
I'm not sure I'm understandable, but what I want is to instance my sessions just one time, maybe one time per CPU (i don't know if it's possible), here's a code example of what i mean :
from multiprocessing import Pool, cpu_count
from mathics.session import MathicsSession
def evaluation(session):
return session.evaluate("5+5")
if __name__ == '__main__':
sessions = []
for i in range(50):
sessions.append(MathicsSession())
with Pool(cpu_count()) as pool:
result = pool.map(evaluation, sessions)
Here's the error showing that my sessions can't be pickled :
Traceback (most recent call last): File "/home/thales_usradm/PycharmProjects/Test/test2.py", line 16, in <module> result = pool.map(evaluation, sessions) File "/usr/lib/python3.10/multiprocessing/pool.py", line 367, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib/python3.10/multiprocessing/pool.py", line 774, in get raise self._value File "/usr/lib/python3.10/multiprocessing/pool.py", line 540, in _handle_tasks put(task) File "/usr/lib/python3.10/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) AttributeError: Can't pickle local object 'Builtin.contribute.<locals>.check_options'
Maybe i can make the instance serializable ?
I am not sure why you would want to create a new session for each invocation of evaluation
if, as you say, creating a session is a rather expensive operation even if you were able to pickle a session. I would think you would want to create one session for each pool process that will be reusable for multiple invocations of your worker function, evaluation
, executing in that pool process.
Therefore, you can specify a pool initializer when you create your pool. This is a function that will be executed once in each pool process when the pool is initialized and it will typically create one or more global variables that your worker function can access.
Perhaps this is a more realistic example. Instead of re-computing the same value 50 times, we will evaluate 50 different expressions:
from multiprocessing import Pool, cpu_count
from mathics.session import MathicsSession
def init_pool_processes():
"""
Create a session for each pool process.
"""
global session
session = MathicsSession()
def evaluation(n):
# session is now a global variable
return session.evaluate(f"5+{n}")
if __name__ == '__main__':
with Pool(cpu_count(), initializer=init_pool_processes) as pool:
result = pool.map(evaluation, range(50))