Search code examples
linkerd

Unable to prevent requests reaching the endpoint after circuit breaking is in action


I am trying to verify linkerd's circuit breaking configuration by requesting through simple error prone endpoint deployed as a pod in the same k8s cluster where linkerd is deployed as a daemonset.

I have noticed circuit breaking happening by observing the logs but when I try to hit the endpoint again I still recieve the response from the endpoint.

Setup and Test

I used below configs to setup linkerd and its endpoint,

https://raw.githubusercontent.com/linkerd/linkerd-examples/master/k8s-daemonset/k8s/linkerd-egress.yaml

https://raw.githubusercontent.com/zillani/kubex/master/examples/simple-err.yml

endpoint behaviour:

endpoint always return 500 internal server error

failure accrual setting: default responseClassifier: retryable5XX

proxy curl:

http_proxy=$(kubectl get svc l5d -o jsonpath="{.status.loadBalancer.ingress[0].*}"):4140 curl -L http://<loadblancer-ingress>:8080/simple-err

Observations

1. At the Admin Metrics

  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/connects" : 505,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/dtab/size.count" : 0,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failed_connect_latency_ms.count" : 0,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failure_accrual/probes" : 8,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failure_accrual/removals" : 2,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failure_accrual/removed_for_ms" : 268542,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failure_accrual/revivals" : 0,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failures" : 505,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failures/com.twitter.finagle.service.ResponseClassificationSyntheticException" : 505,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/loadbalancer/adds" : 2,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/loadbalancer/algorithm/p2c_least_loaded" : 1.0,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/loadbalancer/available" : 2.0,

 "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/failures" : 5,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/failures/com.twitter.finagle.service.ResponseClassificationSyntheticException" : 5,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/pending" : 0.0,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/request_latency_ms.count" : 0,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/requests" : 5,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/retries/budget" : 100.0,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/retries/budget_exhausted" : 5,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/retries/per_request.count" : 0,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/retries/total" : 500,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/success" : 0,

2. At the log

I 0518 10:31:15.816 UTC THREAD23 TraceId:e57aa1baa5148cc5: FailureAccrualFactorymarking connection to "$/io.buoyant.rinet/8080/<loadbalancer-ingress>" as dead.

Problem

After the node being marked as dead, a new request to the linkerd (same http_proxy command above) is hitting the endpoint and returning the response.


Solution

  • This question was answered on the Linkerd community forum. Adding the answer here as well for the sake of completeness:

    When failure accrual (circuit breaker) triggers, the endpoint is put into a state called Busy. This actually doesn't guarantee that the endpoint won't be used. Most load balancers (including the default P2CLeastLoaded) will simply pick the healthiest endpoint. In the case where failure accrual has triggered on all endpoints, this means it will have to pick one in the Busy state.