Search code examples
amazon-web-servicesamazon-ecsamazon-vpcaws-cdk

ECS task unable to pull secrets or registry auth


I have a CDK project that creates a CodePipeline which deploys an application on ECS. I had it all previously working, but the VPC was using a NAT gateway, which ended up being too expensive. So now I am trying to recreate the project without requiring a NAT gateway. I am almost there, but I have now run into issues when the ECS service is trying to start tasks. All tasks fail to start with the following error:

ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secret from asm: service call has been retried 5 time(s): failed to fetch secret

At this point I've kind of lost track of the different things I have tried, but I will post the relevant bits here as well as some of my attempts.

const repository = ECR.Repository.fromRepositoryAttributes(
  this,
  "ecr-repository",
  {
    repositoryArn: props.repository.arn,
    repositoryName: props.repository.name,
  }
);

// vpc
const vpc = new EC2.Vpc(this, this.resourceName(props, "vpc"), {
  maxAzs: 2,
  natGateways: 0,
  enableDnsSupport: true,
});
const vpcSecurityGroup = new SecurityGroup(this, "vpc-security-group", {
  vpc: vpc,
  allowAllOutbound: true,
});
// tried this to allow the task to access secrets manager
const vpcEndpoint = new EC2.InterfaceVpcEndpoint(this, "secrets-manager-task-vpc-endpoint", {
  vpc: vpc,
  service: EC2.InterfaceVpcEndpointAwsService.SSM,
});

const secrets = SecretsManager.Secret.fromSecretCompleteArn(
  this,
  "secrets",
  props.secrets.arn
);

const cluster = new ECS.Cluster(this, this.resourceName(props, "cluster"), {
  vpc: vpc,
  clusterName: `api-cluster`,
});

const ecsService = new EcsPatterns.ApplicationLoadBalancedFargateService(
  this,
  "ecs-service",
  {
    taskSubnets: {
      subnetType: SubnetType.PUBLIC,
    },
    securityGroups: [vpcSecurityGroup],
    serviceName: "api-service",
    cluster: cluster,
    cpu: 256,
    desiredCount: props.scaling.desiredCount,
    taskImageOptions: {
      image: ECS.ContainerImage.fromEcrRepository(
        repository,
        this.ecrTagNameParameter.stringValue
      ),
      secrets: getApplicationSecrets(secrets), // returns 
      logDriver: LogDriver.awsLogs({
        streamPrefix: "api",
        logGroup: new LogGroup(this, "ecs-task-log-group", {
          logGroupName: `${props.environment}-api`,
        }),
        logRetention: RetentionDays.TWO_MONTHS,
      }),
    },
    memoryLimitMiB: 512,
    publicLoadBalancer: true,
    domainZone: this.hostedZone,
    certificate: this.certificate,
    redirectHTTP: true,
  }
);

const scalableTarget = ecsService.service.autoScaleTaskCount({
  minCapacity: props.scaling.desiredCount,
  maxCapacity: props.scaling.maxCount,
});

scalableTarget.scaleOnCpuUtilization("cpu-scaling", {
  targetUtilizationPercent: props.scaling.cpuPercentage,
});
scalableTarget.scaleOnMemoryUtilization("memory-scaling", {
  targetUtilizationPercent: props.scaling.memoryPercentage,
});

secrets.grantRead(ecsService.taskDefinition.taskRole);
repository.grantPull(ecsService.taskDefinition.taskRole);

I read somewhere that it probably has something to do with Fargate version 1.4.0 vs 1.3.0, but I'm not sure what I need to change to allow the tasks to access what they need to run.


Solution

  • You need to create an interface endpoints for Secrets Manager, ECR (two types of endpoints), CloudWatch, as well as a gateway endpoint for S3.

    Refer to the documentation on the topic.

    Here's an example in Python, it'd work the same in TS:

    vpc.add_interface_endpoint(
        "secretsmanager_endpoint",
        service=ec2.InterfaceVpcEndpointAwsService.SECRETS_MANAGER,
    )
    vpc.add_interface_endpoint(
        "ecr_docker_endpoint",
        service=ec2.InterfaceVpcEndpointAwsService.ECR_DOCKER,
    )
    vpc.add_interface_endpoint(
        "ecr_endpoint",
        service=ec2.InterfaceVpcEndpointAwsService.ECR,
    )
    vpc.add_interface_endpoint(
        "cloudwatch_logs_endpoint",
        service=ec2.InterfaceVpcEndpointAwsService.CLOUDWATCH_LOGS,
    )
    vpc.add_gateway_endpoint(
        "s3_endpoint",
        service=ec2.GatewayVpcEndpointAwsService.S3
    )
    

    Keep in mind that interface endpoints cost money as well, and may not be cheaper than a NAT.