How do I make sure that one container only tries to handle one request? I am running a Flask API server in my container, but it is not designed to handle multiple requests at the same time.
Right now it seems like multiple requests are put into one pod/container as I keep getting an OOMKilled status.
Note that this only happens when I send requests in quick succession, e.g. 3 requests with 3 seconds in between.
Note that I am not 100% sure that this is happening, I find it difficult to define where the requests are going in the AKS cluster. If you have any advice on how to monitor this, I would greatly appreciate it!
I tried to put the resource request and resource limit to the same value in the deployment.yaml like this:
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "100m"
memory: "128Mi"
This is not my prefered way to solve the problem as most of the time my program only needs 32Mi memory and the 128Mi is not needed that often.
It is not designed to handle multiple requests at the same time
Well the code is not designed properly then 😅 There are limits to throw more servers to solve a code problem.
If I were you, here is what I would do: