GIL is killing I/O-bound thread

I've got a website written mostly in Python. The Python process that handles Python-bound requests has a dispatch thread which fetches requests from the web server and simply dispatches them to a thread-pool for handling. The work done in the dispatch thread, thus, is pretty simple; it just reads requests over a Unix socket and does a bit of synchronization on the thread pool. Under normal circumstances, it is capable of dispatching over 2,000 requests per second.

Something weird happens sometimes, however. One part of the website does some image processing on uploaded files, and since the image processing algorithm is written entirely in Python, it takes a bit of time, spinning on the CPU. On larger images, it can take 5 seconds or more. That's fine in itself, though; the weird thing is that while it does its processing, throughput on the dispatch thread drops tremendously. While the image processor is running, dispatch throughput drops to some 20-30 requests per second -- almost two orders of magnitude!

This causes some minor trouble for me, since during busy hours, the Python handler receives some 50-100 requests per second, and therefore is unable to keep up. For image processing requests that take some 3 seconds or more, the buffers start filling up and the web server is consequently forced to start dropping requests bound for Python.

I wrote a visualization tool to help debug the problem, and this image (cropped above) demonstrates what is happening. The dispatch of each request is plotted as a line along the X axis, each subsequent request being plotted on subsequent Y coordinates. Each vertical grid-line illustrates a second, and the red grid-line is where my HTTP server logs that it is starting to drop requests. It can clearly be seen that the dispatch rate slows down a lot about 2.5 second prior to that, and comparing with the access logs, that is where the image processor kicked off.

My hypothesis is that this is because the CPU-bound image processor thread is hogging the GIL, and that the dispatcher has to wait for some particular "processing window" to complete until the CPU-bound thread voluntarily releases the GIL for other threads to run. Whereas the dispatcher thread, on its hand, releases the GIL each time it goes into a blocking syscall and then has to wait for another entire processing window to complete before it is allowed to process the next request.

If this hypothesis is correct, then I realize that I could fix this problem by forking off a separate process to do the image processing work. That would complicate the code and make it uglier, however, so I'd like to avoid that if possible.

Thus: Is there any way to avoid this apparent GIL problem? Can I make it so that the dispatcher thread doesn't relinquish the GIL so easily, allowing it to work off some backlog in between processing windows? Can the GIL CPU window be "tweaked", or can I perhaps assign some lower "GIL priority" to the CPU-bound thread or something like that? Is there some other way around it? Or have I perhaps misunderstood the problem entirely?

Sorry for being long-winded, but I couldn't really figure a more concise way to describe the situation.

Solution

I did manage to figure out why this happened. As it turns out, it was not so much blocking syscalls that were a problem in themselves, but that part of the implementation of the thread pool made the dispatch thread wait until a worker thread could acknowledge that it had taken the request (for accounting reasons, basically) by way of signalling a condition variable that the dispatch thread waited on.

I tried reimplementing the thread-pool such that the dispatch thread could simply post the request without having to work in lock-step with a worker thread, and that seems to have made the problem go away entirely. Visualizing the request dispatching over a period of image processing now shows no slow-down whatsoever. Presumably, then, the switching of the GIL between two threads created a larger window for the third, CPU-bound thread to snatch it for a longer period.

The lesson to be learned, then, I guess, is that current CPython (I'm using 3.4.2 on the server running this) seems to be fine with mixing I/O-bound and CPU-bound threads, but that two or more threads working in lock-step with each other may be starved by a CPU-bound thread.