We use Openshift 4.x. For an API, min pods is 5 and max is 8. Horizontal autoscaling is configured based on the avg. CPU utilization percentage. The property haproxy.router.openshift.io/pod-concurrent-connections = '10' --> restricts the number of connections to each Pod to 10. What happens if we get more requests to the pod? Does it wait in the queue or does the pods scale up horizontally?
Below is current configuration in Routes for this API:
haproxy.router.openshift.io/disable_cookies: 'true' haproxy.router.openshift.io/balance: roundrobin haproxy.router.openshift.io/pod-concurrent-connections: '10' haproxy.router.openshift.io/timeout: 50s
The HPA will spawn a new pod if the defined CPU limit is reached, not based the number of connections
So to answer your question "What happens if we get more requests to the pod?", the answer is
The HPA may also spawn another pod before that, if the CPU goes above the defined limit. The 2 metrics (# of connections and CPU consumption) are not related in the context of k8s/OCP
Keep in mind that the CPU limit defined in the HPA is the average CPU observed for all the running pods at the time