I have a CDK project that creates a CodePipeline which deploys an application on ECS. I had it all previously working, but the VPC was using a NAT gateway, which ended up being too expensive. So now I am trying to recreate the project without requiring a NAT gateway. I am almost there, but I have now run into issues when the ECS service is trying to start tasks. All tasks fail to start with the following error:
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried 5 time(s): failed to fetch secret
At this point I've kind of lost track of the different things I have tried, but I will post the relevant bits here as well as some of my attempts.
const repository = ECR.Repository.fromRepositoryAttributes(
this,
"ecr-repository",
{
repositoryArn: props.repository.arn,
repositoryName: props.repository.name,
}
);
// vpc
const vpc = new EC2.Vpc(this, this.resourceName(props, "vpc"), {
maxAzs: 2,
natGateways: 0,
enableDnsSupport: true,
});
const vpcSecurityGroup = new SecurityGroup(this, "vpc-security-group", {
vpc: vpc,
allowAllOutbound: true,
});
// tried this to allow the task to access secrets manager
const vpcEndpoint = new EC2.InterfaceVpcEndpoint(this, "secrets-manager-task-vpc-endpoint", {
vpc: vpc,
service: EC2.InterfaceVpcEndpointAwsService.SSM,
});
const secrets = SecretsManager.Secret.fromSecretCompleteArn(
this,
"secrets",
props.secrets.arn
);
const cluster = new ECS.Cluster(this, this.resourceName(props, "cluster"), {
vpc: vpc,
clusterName: `api-cluster`,
});
const ecsService = new EcsPatterns.ApplicationLoadBalancedFargateService(
this,
"ecs-service",
{
taskSubnets: {
subnetType: SubnetType.PUBLIC,
},
securityGroups: [vpcSecurityGroup],
serviceName: "api-service",
cluster: cluster,
cpu: 256,
desiredCount: props.scaling.desiredCount,
taskImageOptions: {
image: ECS.ContainerImage.fromEcrRepository(
repository,
this.ecrTagNameParameter.stringValue
),
secrets: getApplicationSecrets(secrets), // returns
logDriver: LogDriver.awsLogs({
streamPrefix: "api",
logGroup: new LogGroup(this, "ecs-task-log-group", {
logGroupName: `${props.environment}-api`,
}),
logRetention: RetentionDays.TWO_MONTHS,
}),
},
memoryLimitMiB: 512,
publicLoadBalancer: true,
domainZone: this.hostedZone,
certificate: this.certificate,
redirectHTTP: true,
}
);
const scalableTarget = ecsService.service.autoScaleTaskCount({
minCapacity: props.scaling.desiredCount,
maxCapacity: props.scaling.maxCount,
});
scalableTarget.scaleOnCpuUtilization("cpu-scaling", {
targetUtilizationPercent: props.scaling.cpuPercentage,
});
scalableTarget.scaleOnMemoryUtilization("memory-scaling", {
targetUtilizationPercent: props.scaling.memoryPercentage,
});
secrets.grantRead(ecsService.taskDefinition.taskRole);
repository.grantPull(ecsService.taskDefinition.taskRole);
I read somewhere that it probably has something to do with Fargate version 1.4.0 vs 1.3.0, but I'm not sure what I need to change to allow the tasks to access what they need to run.
You need to create an interface endpoints for Secrets Manager, ECR (two types of endpoints), CloudWatch, as well as a gateway endpoint for S3.
Refer to the documentation on the topic.
Here's an example in Python, it'd work the same in TS:
vpc.add_interface_endpoint(
"secretsmanager_endpoint",
service=ec2.InterfaceVpcEndpointAwsService.SECRETS_MANAGER,
)
vpc.add_interface_endpoint(
"ecr_docker_endpoint",
service=ec2.InterfaceVpcEndpointAwsService.ECR_DOCKER,
)
vpc.add_interface_endpoint(
"ecr_endpoint",
service=ec2.InterfaceVpcEndpointAwsService.ECR,
)
vpc.add_interface_endpoint(
"cloudwatch_logs_endpoint",
service=ec2.InterfaceVpcEndpointAwsService.CLOUDWATCH_LOGS,
)
vpc.add_gateway_endpoint(
"s3_endpoint",
service=ec2.GatewayVpcEndpointAwsService.S3
)
Keep in mind that interface endpoints cost money as well, and may not be cheaper than a NAT.