Search code examples
amazon-web-servicesamazon-ecsaws-code-deployblue-green-deployment

ECS not scaling EC2 instances when doing Blue/Green deployments with Code Deploy


I want to run a Blue/Green deployment for ECS, using EC2 instances (with managed auto-scaling rules) as the capacity provider.

I pretty much followed the docs here:

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-blue-green.html

  1. Create a Application Load Balancer
  2. Create 2 target groups (for green and blue deployment), and assign the first to the load balancer as listener on port 80. These use EC2 instances instead of IP Address as I am using the bridge network type.
  3. Create ECS Cluster, but with:
  • Infrastructure: AWS Fargate, Amazon EC2 instances (both enabled)
  • Autoscaling Group: Bottlerocket ARM 64 (Kernel 5.10), t4g.medium, min: 1, max: 5, Volume Size: 20GB
  1. Create a simple task definition (This is slightly different to the docs, as the docs uses Fargate, and I want to use EC2, and network type is "bridge"):
{
    "family": "sample-task",
    "containerDefinitions": [
        {
            "name": "web-app",
            "image": "public.ecr.aws/docker/library/httpd:latest",
            "cpu": 0,
            "portMappings": [
                {
                    "name": "web-app-80-tcp",
                    "containerPort": 80,
                    "hostPort": 0,
                    "protocol": "tcp",
                    "appProtocol": "http"
                }
            ],
            "essential": true,
            "entryPoint": [
                "sh",
                "-c"
            ],
            "command": [
                "/bin/sh -c \"echo '<html> <head> <title>HEllO WORLD!</title> <style>body {margin-top: 40px; background-color: #333;} </style> </head><body> <div style=color:white;text-align:center> <h1>New! $RANDOM</h1> <h2>Congratulations!</h2> <p>Your application is now running on a container in Amazon ECS.</p> </div></body></html>' >  /usr/local/apache2/htdocs/index.html && httpd-foreground\""
            ],
            "environment": [],
            "mountPoints": [],
            "volumesFrom": []
        }
    ],
    "taskRoleArn": "arn:aws:iam::XXXXX:role/ecsTaskExecutionRole",
    "executionRoleArn": "arn:aws:iam::XXXXX:role/ecsTaskExecutionRole",
    "networkMode": "bridge",
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "256",
    "memory": "512",
    "runtimePlatform": {
        "cpuArchitecture": "ARM64",
        "operatingSystemFamily": "LINUX"
    }
}
  1. Create a service (This is again slightly different to the docs, as the docs uses Fargate):
{
    "cluster": "ecs-blue-green-deployment-ec2",
    "serviceName": "service-bluegreen-ec2",
    "capacityProviderStrategy": [
        {
            "capacityProvider": "Infra-ECS-Cluster-ecs-blue-green-deployment-ec2-xxxx-EC2CapacityProvider-xxxx",
            "weight": 1,
            "base": 0
        }
    ],
    "schedulingStrategy": "REPLICA",
    "taskDefinition": "arn:aws:ecs:region:xxxx:task-definition/sample-task:1",
    "deploymentController": {
        "type": "CODE_DEPLOY"
    },
    "loadBalancers": [
        {
            "targetGroupArn": "arn:aws:elasticloadbalancing:region:xxxxx:targetgroup/blue-green-ecs-ec2-1/xxxxx",
            "containerName": "web-app",
            "containerPort": 80
        }
    ],
    "desiredCount": 1
}
  1. Create an application and deployment group in Code Deploy with "All AT ONCE" option.

Result

All the above works as expected, and the URL of the load balancer loads the page just fine.

Now as I am using t4g.medium, I expect that each instance to run a few tasks, before scaling up (about 6).

If I update the number of desired tasks to 2, and click force new deployment, everything works as expected.

However, if I enter the desired tasks to 10, I get the error:

service xxxx was unable to place a task because no container instance met all of its requirements. The closest matching container-instance xxxxxx has insufficient memory available. For more information, see the Troubleshooting section of the Amazon ECS Developer Guide.

It is trying to deploy all the tasks without scaling.

Usually the number of instances is supposed to scale up, but for some reasons ECS using EC2 with Blue/Green deployments won't do that. Note everything scales up correctly if I use:

  • Fargate with or without Blue/Green Deployment
  • EC2 without Blue/Green Deployment

NOTE: I also tried using awsvpc network type, and had similar errors.

Why doesn't the ECS cluster scale the EC2 instances when doing deployments with Code Deploy?


Solution

  • Looks like this is a reported bug for ECS:

    https://github.com/aws/containers-roadmap/issues/713#issuecomment-824349836 https://github.com/aws/containers-roadmap/issues/1752

    The work-around is to do deploys using Code Deploy (not using the ECS console interface) and add the CapacityProvider in the AppSpec.yaml:

    {
      "version": 1,
      "Resources": [
        {
          "TargetService": {
            "Type": "AWS::ECS::Service",
            "Properties": {
              "TaskDefinition": "arn:aws:ecs:<region>:<MyAccountNumber>:task-definition/<MyTaskDefinitionName>:<MyTaskDefinitionVersion>",
              "LoadBalancerInfo": {
                "ContainerName": "<MyContainerName>",
                "ContainerPort": <MyContainerPort>
              },
              "CapacityProviderStrategy": [
                {
                    "CapacityProvider": "<MyCapacityProviderNameFromECSCluster>",
                    "Base": 0,
                    "Weight": 1
                }
              ]
            }
          }
        }
      ]
    }
    

    Why does it not keep the default, when this value is ommitted is beyond me. Hope this issue gets fixed.