docker http kubernetes azure-aks horizontal-pod-autoscaling

Limit one container to handling only 1 request at the time in Azure Kubernetes Services

How do I make sure that one container only tries to handle one request? I am running a Flask API server in my container, but it is not designed to handle multiple requests at the same time.

Right now it seems like multiple requests are put into one pod/container as I keep getting an OOMKilled status.

Note that this only happens when I send requests in quick succession, e.g. 3 requests with 3 seconds in between.

Note that I am not 100% sure that this is happening, I find it difficult to define where the requests are going in the AKS cluster. If you have any advice on how to monitor this, I would greatly appreciate it!

I tried to put the resource request and resource limit to the same value in the deployment.yaml like this:

    resources:
      requests:
        cpu: "100m"
        memory: "128Mi"
      limits:
        cpu: "100m"
        memory: "128Mi"

This is not my prefered way to solve the problem as most of the time my program only needs 32Mi memory and the 128Mi is not needed that often.

Solution

It is not designed to handle multiple requests at the same time

Well the code is not designed properly then 😅 There are limits to throw more servers to solve a code problem.

If I were you, here is what I would do:

Fix the code to handle several requests. Maybe you have a memory leak.
Increase the memory (double it, and see if it helps)
Monitor your app with something like grafana.com to know why it is increasing
increase concurrency
create an HPA (Horizontal pod autoscaler) based on Memory, when the memory increases to a certain threshold, it will increase your pod count.
add readiness probe and configure it in a way that if the pod doesn't answer, the LB won't send requests to the pod.
if you really need to process only 1 request at the time, use a queue. An API will put items in a queue when receiving requests, and a worker will process 1 item by 1 item.