Search code examples
amazon-web-servicesamazon-ecsaws-fargate

ECS Fargate + Network Load Balancer Healthcheck


I'm experiencing an issue with the following setup:

API Gateway -> VPC Link -> Private NLB -> Target Group -> AWS ECS Fargate

If I setup the NLB's Health Check to be TCP/HTTP on a specified endpoint, that endpoint gets hammered to the death with internal request (no requests are coming through the API Gateway, I checked): Relevant picture, note the delay between Health Checks

My problem with this behaviour, other than having the health's endpoint spammed by my own architecture is that the application's functionality is suffering (I keep getting slow responses 1 out of 4 get request to the API).

I tried to modify the Health Check's behaviour to only TCP, same slow responses.

I tried temporarily switching to a public ALB, I'm incurring in double health-checks, separated by 30 seconds but my application is responding with an average of 100 ms.

So, as an example of what I mean by "double health-checks":

Health Check 1.1 at 00:00:00

Health Check 2.1 at 00:00:10

Health Check 1.2 at 00:00:30

Health Check 2.2 at 00:00:40

Any ideas?


Solution

  • TL/DR;

    Enable the "Cross-Zone Load Balancing" NLB flag.

    The issue was the "cross-availability zone" not checked out. It seems that when a request gets processed by a NLB-node which resides in a different AZ from the one that it is trying to be redirecting, it tries to internally resolve the IP in the AZ, if it fails, it redirects the request to another NLB-node in the appropriate AZ, which will be able to do so, hence reaching the target.