Search code examples
pythonmultithreadingarchitecturegeventpreforking

Preforking a Multithreaded Python application


I have a Python program that is already multithreaded and I'd like to replace some of the threads with processes in order to reduce context switching and utilize gevent for async I/O.
The main process is I/O bound so I'd like to use gevent in order to be able to handle a lot of concurrent I/O. We'll call it the Receiver component of my system.

The rest of the program is mostly CPU bound so I'd like to have each process have some threads that handle requests from the Receiver. These are my worker processes.
The reason that I chose threads for handling multiple requests in one process is because threads are cheaper to create and destroy. If the program receives a lot of requests it can automatically scale to start more threads in order to handle more requests. When the load decreases it can get rid of the extra threads in order to avoid the extra overhead of context switching.

Forking with gevent can cause some problems and gipc exists exactly to solve those problems.
The worker threads do sometimes read from various sources such as cache and databases but if I understand correctly the GIL will switch to another thread when I/O occurs.

If I do decide I want gevent inside my workers I can (I think) avoid monkeypatching the threading module and assign a greenlet pool for each worker process. Will the GIL still be released when I/O occurs and another thread will start executing until the I/O call completes when combining gevent with threads?

Finally there's another process which saves the response to a database. It's naturally I/O bound so gevent would be an excellent choice to perform this action.

I have read about the dangers of mixing threads and prefork. I'm not going to create any threads in the main process so no locking mechanisms such as mutexes will be copied to the child processes. I am not going to fork any of my child processes. Is it safe to assume I'm not in trouble in any stage of this design? Does Python mitigate some of the problems with preforking and threading?


Solution

  • Python's GIL will prevent any actual concurrency within a single Python process. So while you can use multithreading instead or async IO to deal with a multitude of requests per worker, for true concurrency you need the multiprocessing package of python. You should probably use a Pool with a configured max_requests_per_child of a few hundred or so requests, and must pay attention to the number of actual processes. If your task is truly hard on the CPU, you can stall your system if you have no cores left doing "other stuff". But this can only be inferred through experimentation.