Search code examples
amazon-web-servicesaws-application-load-balanceraws-auto-scaling

AWS Auto Scaling Group does not detect instance is unhealthy from ELB


I’m trying to get an AWS Auto Scaling Group to replace ‘unhealthy’ instances, but I can’t get it to work.

From the console, I’ve created a Launch Configuration and, from there, an Auto Scaling Group with an Application Load Balancer. I've kept all settings regarding the target group and listeners the same as the default settings. I’ve selected ‘ELB’ as an additional health check type for the Auto Scaling Group. I’ve consciously misconfigured the Launch Configuration to result in ‘broken’ instances -- there is no web server to listen to the port configured in the listener.

The Auto Scaling Group seems to be configured correctly and is definitely aware of the load balancer. However, it thinks the instance it has spun up is healthy.

// output of aws autoscaling describe-auto-scaling-groups:

{
    "AutoScalingGroups": [
        {
            "AutoScalingGroupName": "MyAutoScalingGroup",
            "AutoScalingGroupARN": "arn:aws:autoscaling:eu-west-1:<accountId>:autoScalingGroup:3edc728f-0831-46b9-bbcc-16691adc8f44:autoScalingGroupName/MyAutoScalingGroup",
            "LaunchConfigurationName": "MyLaunchConfiguration",
            "MinSize": 1,
            "MaxSize": 3,
            "DesiredCapacity": 1,
            "DefaultCooldown": 300,
            "AvailabilityZones": [
                "eu-west-1b",
                "eu-west-1c",
                "eu-west-1a"
            ],
            "LoadBalancerNames": [],
            "TargetGroupARNs": [
                "arn:aws:elasticloadbalancing:eu-west-1:<accountId>:targetgroup/MyAutoScalingGroup-1/1e36c863abaeb6ff"
            ],
            "HealthCheckType": "ELB",
            "HealthCheckGracePeriod": 300,
            "Instances": [
                {
                    "InstanceId": "i-0b589d33100e4e515",
                    // ...
                    "LifecycleState": "InService",
                    "HealthStatus": "Healthy",
                    // ...
                }
            ],
            // ...
        }
    ]
}

The load balancer, however, is very much aware that the instance is unhealthy:

// output of aws elbv2 describe-target-health:

{
    "TargetHealthDescriptions": [
        {
            "Target": {
                "Id": "i-0b589d33100e4e515",
                "Port": 80
            },
            "HealthCheckPort": "80",
            "TargetHealth": {
                "State": "unhealthy",
                "Reason": "Target.Timeout",
                "Description": "Request timed out"
            }
        }
    ]
}

Did I just misunderstand the documentation? If not, what else is needed to be done to get the Auto Scaling Group to understand that this instance is not healthy and refresh it?

To be clear, when instances are marked unhealthy manually (i.e. using aws autoscaling set-instance-health), they are refreshed as is expected.


Solution

  • Explanation

    If you have consciously misconfigured the instance from the start and the ELB Health Check has never passed, then the Auto Scaling Group does not acknowledge yet that your ELB/Target Group is up and running. See this page of the documentation.

    After at least one registered instance passes the health checks, it enters the InService state.

    And

    If no registered instances pass the health checks (for example, due to a misconfigured health check), ... Amazon EC2 Auto Scaling doesn't terminate and replace the instances.

    I configured from scratch and arrived at the same behavior as what you described. To verify that this is indeed the root cause, check the Target Group status in the ASG. It is probably in Added state instead of InService.

    [cloudshell-user@ip-10-0-xx-xx ~]$ aws autoscaling describe-load-balancer-target-groups --auto-scaling-group-name test-asg
    {
        "LoadBalancerTargetGroups": [
            {
                "LoadBalancerTargetGroupARN": "arn:aws:elasticloadbalancing:us-east-1:xxx:targetgroup/asg-test-1/abc",
                "State": "Added"
            }
    

    Resolution

    To achieve the desired behavior, what I did was

    1. Run a simple web service on port 80. Ensure Security Group is open for the ELB to talk to EC2.
    2. Wait until the ELB status is healthy. Ensure server is returning 200. You may need to create an empty index.html just to pass the health check.
    3. Wait until the target group status has become InService in the ASG.

    For example, for Step 3:

    [cloudshell-user@ip-10-0-xx-xx ~]$ aws autoscaling describe-load-balancer-target-groups --auto-scaling-group-name test-asg
    {
        "LoadBalancerTargetGroups": [
            {
                "LoadBalancerTargetGroupARN": "arn:aws:elasticloadbalancing:us-east-1:xxx:targetgroup/test-asg-1-alb/abcdef",
                "State": "InService"
            }
        ]
    }
    

    Now that it is in service, turn off the web server and wait. Check often, though, as once ASG detects it is unhealthy it will terminate.

    [cloudshell-user@ip-10-0-xx-xx ~]$ aws autoscaling describe-auto-scaling-groups
    {
        "AutoScalingGroups": [
            {
                "AutoScalingGroupName": "test-asg",
                "AutoScalingGroupARN": "arn:aws:autoscaling:us-east-1:xxx:autoScalingGroup:abc-def-ghi:autoScalingGroupName/test-asg",
                ...
                "LoadBalancerNames": [],
                "TargetGroupARNs": [
                    "arn:aws:elasticloadbalancing:us-east-1:xxx:targetgroup/test-asg-1-alb/abc"
                ],
                "HealthCheckType": "ELB",
                "HealthCheckGracePeriod": 300,
                "Instances": [
                    {
                        "InstanceId": "i-04bed6ef3b2000326",
                        "InstanceType": "t2.micro",
                        "AvailabilityZone": "us-east-1b",
                        "LifecycleState": "Terminating",
                        "HealthStatus": "Unhealthy",
                        "LaunchTemplate": {
                            "LaunchTemplateId": "lt-0452c90319362cbc5",
                            "LaunchTemplateName": "test-template",
                            "Version": "1"
                        },
                 ...
            },
        ...
        ]
    }