I’ve been digging into FastAPI’s handling of synchronous and asynchronous endpoints, and I’ve come across a few things that I’m trying to understand more clearly, especially with regards to how blocking operations behave in Python.
From what I understand, when a synchronous route (defined with def) is called, FastAPI offloads it to a separate thread from the thread pool to avoid blocking the main event loop. This makes sense, as the thread can be blocked (e.g., time.sleep()), but the event loop itself doesn’t get blocked because it continues handling other requests.
But here’s my confusion: If the function is truly blocking (e.g., it’s waiting for something like time.sleep()), how is the event loop still able to execute other tasks concurrently? Isn’t the Python interpreter supposed to execute just one thread at a time?
Here an example:
from fastapi import APIRouter
import os
import threading
import asyncio
app = APIRouter()
@app.get('/sync')
def tarefa_sincrona():
print('Sync')
total = 0
for i in range(10223424*1043):
total += i
print('Sync task done')
@app.get('/async')
async def tarefa_sincrona():
print('Async task')
await asyncio.sleep(5)
print('Async task done')
If I make two requests — the first one to the sync endpoint and the second one to the async endpoint — almost at the same time, I expected the event loop to be blocked. However, in reality, what happens is that the two requests are executed "in parallel."
If the function is truly blocking (e.g., it’s waiting for something like time.sleep()), how is the event loop still able to execute other tasks concurrently? Isn’t the Python interpreter supposed to execute just one thread at a time?
Only one thread is indeed executed at a time. The flaw is to assume that time.sleep()
keeps the thread active - as another answerer has pointed out, it does not.
Which is to say that time.sleep()
does block the thread, but it contains a C
macro that let's the async manager switch to a different thread nonetheless. This isn't clearly described in the linked answer, however.
Concurrency in Python with GIL works as follows:
This is what happens with time.sleep()
- it's coded in such a way that when it itself is sleeping because it's waiting for an OS coroutine, it voluntarily releases the lock on the global interpreter (kind of as if it's waiting for I/O), allowing the event loop to give GIL to a new thread.
If time.sleep()
didn't release its lock, or if you were running a long non-I/O computation, a single thread would indeed block the entire event loop (by hogging the GIL).
Inside of GIL-bound Python, the only way to execute CPU-binding code (that doesn't actively release its lock, like time.sleep()
does) concurrently is at the process-level, so either multiprocessing or concurrent.futures.ProcessPoolExecutor, as each process will have its own GIL.
multiprocessing
docs hint very clearly at the above descriptions:
The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.
And threading
docs:
threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously
Reading between the lines, this is much the same as saying that tasks bound by anything other than I/O won't achieve any noteworthy concurrency through threading.