I read about Knative private and public service. Private service always point to the actual deployment's endpoint while public service can either point to - where private service is pointing or it can point to activator.
But in my case public service always points to the activator (no matter if we are in serve mode or proxy mode). But things works fine. Please check the image below, 10.24.3.16:8012 is activator endpoint:
In scaled down mode (pod count is zero), please check the helloworld-go-00001
In scaled up mode (serve mode) when pod count is more than 0.
Please let me understand what am I missing.
You're noticing an optimization added last year -- in the case of small amounts of traffic (basically, less than 10-15 pods), the activator can often perform better request-weighted list balancing than the typical ingress in terms of queueing and managing concurrencyCount
for existing pods and routing delayed requests to new pods or existing pods which have become available.
If your serving scales up to 20 or 30 pods, you should see the activator stop being in the traffic path; I believe the cutover point is trafficBurstCapacity / ( (1.0-targetCapacity) * concurrencyCount)
pods, but I may be mistaken. If I recall correctly, this works out to something like 200 / (0.3 * 80) > 8
, but I haven't looked in a while.
The way this is implemented in the apiserver is that the Knative autoscaler manages the endpoints for the helloworld-go-00001
service directly, using metrics from the activator and queue-proxy for details.