python-3.x google-cloud-platform gunicorn google-cloud-run

Cloud Run Qs :: max-instances + concurrency + threads (gunicorn thread)

(I'm learning Cloud Run acknowledge this is not development or code related, but hoping some GCP engineer can clarify this)

I have a PY application running - gunicorn + Flask... just PoC for now, that's why minimal configurations.

cloud run deploy has following flags:

--max-instances 1
--concurrency 5
--memory 128Mi
--platform managed

guniccorn_cfg.py files has following configurations:

workers=1
worker_class="gthread"
threads=3

I'd like to know:

1) max-instances :: if I were to adjust this, does that mean a new physical server machine is provisioned whenever needed ? Or, does the service achieve that by pulling a container image and simply starting a new container instance (docker run ...) on same physical server machine, effectively sharing the same physical machine as other container instances?

2) concurrency :: does one running container instance receive multiple concurrent requests (5 concurrent requests processed by 3 running container instances for ex.)? or does each concurrent request triggers starting new container instance (docker run ...)

3) lastly, can I effectively reach concurrency > 5 by adjusting gunicorn thread settings ? for ex. 5x3=15 in this case.. for ex. 15 concurrent requests being served by 3 running container instances for ex.? if that's true any pros/cons adjusting thread vs adjusting cloud run concurrency?

additional info: - It's an IO intensive application (not the CPU intensive). Simply grabbing the HTTP request and publishing to pubsub/sub

thanks a lot

Solution

First of all, it's not appropriate on Stackoverflow to ask "cocktail questions" where you ask 5 things at a time. Please limit to 1 question at a time in the future.

You're not supposed to worry about where containers run (physical machines, VMs, ...). --max-instances limit the "number of container instances" that you allow your app to scale. This is to prevent ending up with a huge bill if someone was maliciously sending too many requests to your app.
This is documented at https://cloud.google.com/run/docs/about-concurrency. If you specify --concurrency=10, your container can be routed to have at most 10 in-flight requests at a time. So make sure your app can handle 10 requests at a time.
Yes, read Gunicorn documentation. Test if your setting "locally" lets gunicorn handle 5 requests at the same time... Cloud Run’s --concurrency setting is to ensure you don't get more than 5 requests to 1 container instance at any moment.

I also recommend you to read the officail docs more thoroughly before asking, and perhaps also the cloud-run-faq once which pretty much answers all these.