ALB (inbound rules)
HTTPS:443 from 0.0.0.0/0 & ::/0
HTTP:80 from 0.0.0.0/0 & ::/0
Cluster (inbound rules)
us-east-1<a,b,c>
under default VPC with public IP enabled)0.375 vCPU/0.25 GB, 1 task, bridge network, 0:3000 (host:container)
0.25 vCPU/0.25 GB, 2 tasks, bridge network, 0:5000 (host:container)
us-east-1<a,b,c>
, same default VPCHTTP:80 → redirect to HTTPS://#{host}:443/#{path}?#{query}
HTTPS:443 (/) → forward to client target group
HTTPS:443 (/api) → forward to server target group
HTTP, /, Traffic Port, 5 healthy, 2 unhealthy, 5s timeout, 30s interval, 200 OK
HTTP, /api/health, Traffic Port, 5 healthy, 2 unhealthy, 5s timeout, 30s interval, 200 OK
Both docker images for client and server work properly locally & the client service seems to work well in AWS ECS. However, the server service keeps cycling between registering and de-registering (draining) the container instances seemingly without even becoming unhealthy
Here is what I see in the service Deployments and events
tab:
5/12/2022, 8:43:04 PM service server registered 2 targets in target-group <...>
5/12/2022, 8:42:54 PM service server has started 2 tasks: task <...> task <...>. <...>
5/12/2022, 8:42:51 PM service server deregistered 1 targets in target-group <...>
5/12/2022, 8:42:51 PM service server has begun draining connections on 1 tasks. <...>
5/12/2022, 8:42:51 PM service server deregistered 1 targets in target-group <...>
5/12/2022, 8:42:17 PM service server registered 2 targets in target-group <...>
5/12/2022, 8:42:07 PM service server has started 2 tasks: task <...> task <...>. <...>
5/12/2022, 8:42:04 PM service server deregistered 1 targets in target-group <...>
5/12/2022, 8:42:04 PM service server has begun draining connections on 1 tasks. <...>
5/12/2022, 8:42:04 PM service server deregistered 1 targets in target-group <...>
Any ideas?
After enabling AWS CloudWatch logs in my task definition's container specs, I was able to see that the issue was actually with an AWS RDS instance.
The RDS instances' SG was accepting traffic from an old cluster SG (which no longer exists), so that clears up why a health check wasn't being performed and the registered instances were draining immediately.