Summary: I can't get ECS Fargate autoscaling to work with CloudFormation, even duplicating a console manual setup.
In AWS ECS I have manually set up a single cluster with a single service and a single task using a load balancer with autoscaling. To make testing easy I use ALDRequestCountPerTarget
, with a target of 3
and scale in/out of 30
seconds. This works! When I reload a page over and over, the service scales the number of tasks. Furthermore even before autoscaling kicks in, in CloudWatch I can see the requests go up for the TargetTracking high and low alarms.
I'm trying to replicate this in CloudFormation. My CloudFormation stacks produces what looks like an identical configuration. But there's one problem: the cluster never autoscales, and the two TargetTracking CloudWatch metrics never show any information; instead they continue to show "Insufficient data" (with no data points whatsoever) no matter how many times I reload the web page that is hitting the load balancer.
I have verified via the console that the service definition and autoscaling configuration are identical in every aspect I can think of. CloudFormation and the scaling policy even creates appropriate CloudWatch metrics for me; those, too, are configured identically to those I created manually via the console. They point to the correct load balancer and target group (those created by the CloudFormation template). But these CloudWatch metrics receive no data.
If I open the stack load balancer, under monitoring I can see it is getting requests. Somehow this data is not getting to CloudWatch. In CloudWatch the appropriate log group was created, but the log stream only has Spring startup output—but that's all the other log group (from the manually-created cluster) has as well, so I don't see how that could be relevant. Besides, the CloudWatch data on which ALBRequestCountPerTarget
is based should be coming from the load balancer, no?
Here is my log group definition in CloudFormation:
LogGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: '/ecs/foo-bar'
RetentionInDays: 400
And here is the task container definition log configuration:
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref LogGroup
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: 'ecs'
I'm not sure if the log group is even relevant, though, since I understand these CloudWatch metrics are based on the load balancer, not the task container output.
Here's the autoscaling definition:
AutoScalingTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
ServiceNamespace: ecs
ResourceId: !Sub "service/${ECSCluster}/${ECSService.Name}"
ScalableDimension: ecs:service:DesiredCount
MinCapacity: 1
MaxCapacity: 3
RoleARN: !Sub "arn:aws:iam::${AWS::AccountId}:role/aws-service-role/ecs.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_ECSService"
AutoScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: foo-bar-policy
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref AutoScalingTarget
TargetTrackingScalingPolicyConfiguration:
PredefinedMetricSpecification:
PredefinedMetricType: ALBRequestCountPerTarget
ResourceLabel: !Sub "${LoadBalancer.LoadBalancerFullName}}/${TargetGroup.TargetGroupFullName}"
ScaleInCooldown: 30
ScaleOutCooldown: 30
TargetValue: 3
Here's something strange, but probably unrelated: when I added autoscaling using the above snippet, but with a PolicyName
of "Foo Bar Policy" (with spaces); when I went to Clusters > Services > Tasks, under the "Logs" tab the per-task log never showed up. The log group was still created and showed up under CloudWatch, and its stream still showed the container startup output. Changing the PolicyName
to foo-bar-policy
without spaces brought back this per-task startup log in the console. In any case I doubt this is even relevant to the nonfunctioning autoscaling, as I believe my autoscaling configuration is based upon the load balancer requests.
Update: If I browse CloudWatch metrics, I can see a list of RequestCountPerTarget
metrics. Two of them are for my manually-created and CloudFormation-created target groups, respectively:
targetgroup/manually-created/12345…
RequestCountPerTarget
targetgroup/CloudFormation-created/abcde…
RequestCountPerTarget
Both of them show graphed data, and for the CloudFormation-created target group I can make the value increase by reloading the web page linked to that load balancer + target group! So the data is getting into CloudWatch.
But the CloudWatch alarm for the CloudFormation stack still shows "insufficient data" (i.e. no data). That alarm indicates the correct load balancer and target group, and looks identical to the one resulting from manually creating the stack. Remember that AWS itself created both these alarms. Why isn't the CloudFormation one showing the data I can see in the metric?
The reason the CloudWatch alarm created with the CloudFormation Stack couldn't see the actual CloudWatch metric values for the load balancer target group was because of a typo:
ResourceLabel: !Sub "${WebLoadBalancer.LoadBalancerFullName}}/${WebTargetGroup.TargetGroupFullName}"
I had inadvertently added an extra }
. It should have been:
ResourceLabel: !Sub "${WebLoadBalancer.LoadBalancerFullName}/${WebTargetGroup.TargetGroupFullName}"
When I was comparing the manually-created and CloudFormation-created CloudWatch alarms, I finally noticed that the latter's reference to the target group ended with a }
, when the target group's full name did not end in }
. In fact it would seem that there is no way a load balancer target group could truly end in }
. But AWS happily accepted this invalid value and put up a CloudWatch alarm that was connected to a target group that didn't exist, using a reference that could never exist.
Once I fixed the typo, things are working now. One would think that AWS might validate things here or there and help out a poor developer …