amazon-web-services amazon-ecs aws-fargate aws-application-load-balancer

AWS ECS Service restarting bevause of failed Target Group healthcheck

I have a .net core API which runs inside a Docker container. This container got deployed to the Amazon ECR where I run it with a Task definition (works already)

snipet from my Task definition

 "portMappings": [
    {
      "hostPort": 50598,
      "protocol": "tcp",
      "containerPort": 50598
    }
  ],

When I start the service, the task runs and it works fine. I get my public IP where I can check if my calls are available:

HTTP 401 is ok because the call checks for a valid token

The problem is when I try to add a Load balancer with a target group.

For this I delete my old service and create a new one with a Load Balancer and Target group

After I start my service like this the Target Group health check response with a "Request Timeout" and keeps restarting my service. When I check for the public IP of the API it still works, only when I try to access my API through the loadbalancer it doesn't work.

Target Group:

Lb:

Solution

You're getting request timeout likely because the security group attached to the task is not allowing inbound access from the load balancer nodes.

By default even if all nodes fail health checks the load balancer will attempt to forward to all nodes, which with a failed inbound evaluation would cause the timeout.

As long as the host allows port 50598 to the load balancer no timeout should occur, if you're using an ALB you can add a source as the security group attached to the load balancer.

Once this access is working, you will need to ensure your health checks are successful for the HTTP status code and path.

It is also worth noting for an Application Load Balancer you can use ranges from 200 - 499, whereas for a Network Load Balancer it can only have health checks from 200 - 399.

Your target group must be configured to run on port 50598 too, it is currently targeted port 80.

Update

The application appeared to be running on port 80. Configuration for the target group and task was updated to use the port. This then began working again.