We have a ML model served with Flask. Load testing the Flask application with Gatling (https://gatling.io/) resulted in very low performance. It could not handle a lot of requests per second. Therefore we have moved to FastAPI.
Serving it locally in a Docker Container with uvicorn or gunicorn worked well. However we have noticed that the application doesn't respond for minutes: Gatling Load Test - Local Docker Container
In this image you can see that the application responds in "batches". Serving our application in a Kubernetes cluster leads to a restart of the container, because the responsible container won't succeed the readiness/liveness probe.
We have asked this question on uvicorn's git. However, I don't think we will get an answer there. We think it might be that we have written code which is blocking the main thread and therefore our FastAPI application won't answer for minutes.
Snippet of the application endpoint:
async def verify_client(token: str):
credentials_exception = HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Could not validate credentials",
headers={"WWW-Authenticate": "Bearer"},
)
try:
return jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM], audience=AUDIENCE)
except JWTError:
raise credentials_exception
@app.post("/score", response_model=cluster_api_models.Response_Model)
async def score(request: cluster_api_models.Request_Model, token: str = Depends(oauth2_scheme)):
logger.info("Token: {0}".format(token))
await verify_client(token)
result = await do_score(request)
return result
The await do_score(request)
has all the preprocessing and prediction code. It's using a gensim fasttext model to create document vectors and a scikit-learn K-Means model. do_score()
is defined with async def do_score(request)
. From the documentation of FastAPI, we thought this would be enough to make our application asynchronous. However it doesn't look like it. It's still processing it sequentially and additional it doesn't respond for minutes. The method also includes a nested for loop O(n²)... not sure whether that can cause blocking code too.
I hope the information provided is enough to get started. If you need more to information about the code, please tell me. I will need to change some variable names of the code then. Thank you very much in advance!
Of course, something would block your application if your application is not fully async, async is just a fancy keyword here.
Even if you define a function with async def
if it does something blocking underneath it will block the entire execution of your app. Aren't you convinced? Test it.
@app.get("/dummy")
async def dummy():
time.sleep(5)
Let's send 3 concurrent requests to it.
for _ in {1..3}; do curl http://127.0.0.1:8000/dummy &; done
This will take +15 seconds.
Let's dive deeper, I said async def is just a fancy syntax of declaring a coroutine, why? See PEP 492
async def
functions are always coroutines, even if they do not containawait
expressions.
When you define a coroutine, with await
syntax you are saying your event loop to keep going, well, it does that, it switches to another coroutine and runs it.
Basically, coroutines don't wait for the results, it just keeps going. But when you define a normal function it will wait for the execution of that of course.
You might want to use a Job/Task Queue library like Celery.