Search code examples
pythonsqlalchemymultiprocessingpathos

When using the multiprocessing library, how do I bind resources to specific processes?


Say I have 50 processes, and I'm using them to operate on (say) 20000 different input values. (I'm using the pathos library, which I think operates similarly to the multiprocessing library in Python.)

thread_pool = pathos.multiprocessing.ProcessingPool(threads=50)
thread_pool.map(function, inputs)

I want to create one SQLAlchemy database engine for each process (but I don't have the resources to create one for each input value). Then I want all inputs that are processed using that process to work with the same database engine.

How can I do this?


Solution

  • I'm the author of both pathos and multiprocess. It turns out that multiprocess is actually what pathos is using, but maybe it's not obvious that is the case. You can do it from pathos:

    >>> import pathos
    >>> pathos.pools._ProcessPool 
    <class 'multiprocess.pool.Pool'>
    

    The above is the raw Pool directly from multiprocess, while pathos.pools.ProcessPool is a higher-level wrapper with some additional features, but does not (yet) expose all the keyword arguments from the lower-level Pool.