How do Uvicorn workers work, and how many do I need for a slim machine?

The application I deploy is FastAPI with Uvicorn under K8s. While trying to understand how I want to Dockerize the application I understood I want to implement Uvicorn without Gunicorn and to add a system of scale up/down by the load of the requests the application is getting. I did a lot of load testing and discovered that with the default of 1 Uvicorn worker I'm getting 3.5 RPS, while changing the workers to 8 I can get easly 22 RPS (didn't check for more since its great results for me).

Now what I was expecting regarding the resources is that the CPU that I will have to provide will be with a limit of 8 (I assume every worker works on one process and thread), but I saw only increase in the memory usage, but barley in the CPU. maybe its because the app don't use that much CPU but indeed its possible for it to use more than 1 CPU? so far it didn't used more than one CPU.

How do Uvicorn workers work? How should I calculate how many workers I need for the app? I didn't find any useful information.

Again, my goal is to keep a slim machine of 1 cpu, with Autoscaling system.

Solution

When using uvicorn and applying the --workers argument greater than 1, then uvicorn will spawn subprocesses internally using multiprocessing.

You have to remember that uvicorn is asynchronous and that HTTP servers generally are bottle necked by network latency instead of computation. So, it could be that your workloads aren't particularly CPU bound and are IO bound.

Without knowing more about the type of work being done by the server on each request, the best way to determine how many workers you will need will be through empirical experimentation. In other words, just test it until you hit a limit.

Though the FastAPI documentation does include some guidance for your use case:

If you have a cluster of machines with Kubernetes, Docker Swarm Mode, Nomad, or another similar complex system to manage distributed containers on multiple machines, then you will probably want to handle replication at the cluster level instead of using a process manager (like Gunicorn with workers) in each container.

One of those distributed container management systems like Kubernetes normally has some integrated way of handling replication of containers while still supporting load balancing for the incoming requests. All at the cluster level.

In those cases, you would probably want to build a Docker image from scratch as explained above, installing your dependencies, and running a single Uvicorn process instead of running something like Gunicorn with Uvicorn workers. - FastAPI Docs

Emphasis mine.