python async-await gunicorn fastapi uvicorn

FastAPI application serving a ML model has blocking code?

We have a ML model served with Flask. Load testing the Flask application with Gatling (https://gatling.io/) resulted in very low performance. It could not handle a lot of requests per second. Therefore we have moved to FastAPI.

Serving it locally in a Docker Container with uvicorn or gunicorn worked well. However we have noticed that the application doesn't respond for minutes: Gatling Load Test - Local Docker Container

In this image you can see that the application responds in "batches". Serving our application in a Kubernetes cluster leads to a restart of the container, because the responsible container won't succeed the readiness/liveness probe.

We have asked this question on uvicorn's git. However, I don't think we will get an answer there. We think it might be that we have written code which is blocking the main thread and therefore our FastAPI application won't answer for minutes.

Snippet of the application endpoint:

async def verify_client(token: str):
    credentials_exception = HTTPException(
        status_code=status.HTTP_401_UNAUTHORIZED,
        detail="Could not validate credentials",
        headers={"WWW-Authenticate": "Bearer"},
    )
    try:
        return jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM], audience=AUDIENCE)
    except JWTError:
        raise credentials_exception


@app.post("/score", response_model=cluster_api_models.Response_Model)
async def score(request: cluster_api_models.Request_Model, token: str = Depends(oauth2_scheme)):
    logger.info("Token: {0}".format(token))
    await verify_client(token)
    result = await do_score(request)
    return result

The await do_score(request) has all the preprocessing and prediction code. It's using a gensim fasttext model to create document vectors and a scikit-learn K-Means model. do_score() is defined with async def do_score(request). From the documentation of FastAPI, we thought this would be enough to make our application asynchronous. However it doesn't look like it. It's still processing it sequentially and additional it doesn't respond for minutes. The method also includes a nested for loop O(n²)... not sure whether that can cause blocking code too.

I hope the information provided is enough to get started. If you need more to information about the code, please tell me. I will need to change some variable names of the code then. Thank you very much in advance!

Solution

Of course, something would block your application if your application is not fully async, async is just a fancy keyword here.

Even if you define a function with async def if it does something blocking underneath it will block the entire execution of your app. Aren't you convinced? Test it.

@app.get("/dummy")
async def dummy():
    time.sleep(5)

Let's send 3 concurrent requests to it.

for _ in {1..3}; do curl http://127.0.0.1:8000/dummy &; done

This will take +15 seconds.

Let's dive deeper, I said async def is just a fancy syntax of declaring a coroutine, why? See PEP 492

async def functions are always coroutines, even if they do not contain await expressions.

Why does it matter?

When you define a coroutine, with await syntax you are saying your event loop to keep going, well, it does that, it switches to another coroutine and runs it.

What is the difference?

Basically, coroutines don't wait for the results, it just keeps going. But when you define a normal function it will wait for the execution of that of course.

Since we both know it would block, what you can do?

You might want to use a Job/Task Queue library like Celery.