Search code examples
aws-cloudformationamazon-linux-2aws-cdk

AWS CDK CloudFormationInit timeout when installing yum package


I am trying to deploy the CDK stack below:

class MyCdkStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        vpc = ec2.Vpc.from_lookup(self, "VPC", vpc_id=EXISTING_VPC_ID)

        amzn_linux = ec2.MachineImage.latest_amazon_linux(
            generation=ec2.AmazonLinuxGeneration.AMAZON_LINUX_2
        )

        role = iam.Role(
            self, "Role", assumed_by=iam.ServicePrincipal("ec2.amazonaws.com")
        )

        role.add_managed_policy(
            iam.ManagedPolicy.from_aws_managed_policy_name(
                "AmazonSSMManagedInstanceCore"
            )
        )

        instance = ec2.Instance(
            self,
            "Instance",
            instance_type=ec2.InstanceType("t3.micro"),
            machine_image=amzn_linux,
            vpc=vpc,
            vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PUBLIC),
            role=role,
            init=ec2.CloudFormationInit.from_elements(
                ec2.InitPackage.yum("docker"),
            ),
            init_options=ec2.ApplyCloudFormationInitOptions(
                timeout=Duration.minutes(5),
                ignore_failures=True,
            ),
        )
        # Allow ssh connections from anywhere
        instance.connections.allow_from_any_ipv4(ec2.Port.tcp(22))

        # Elastic IP
        eip = ec2.CfnEIP(self, "EIP", instance_id=instance.instance_id)

        # Outputs
        CfnOutput(self, "EIP Address", value=eip.ref)

The deployment fails after 5 minutes and rolls back with the following error message:

Failed to receive 1 resource signal(s) within the specified duration

Here are possible problems I have considered:

  1. The server might not have outbound internet access (but I have put it on a public subnet).
  2. I've tried using an Amazon Linux 2022 AMI instead.
  3. The 5 minute timeout might not be sufficient (but I have tried increasing to 15 minutes to no avail).
  4. There is something else wrong with my setup (but without the CloudFormationInit stuff the server is created as expected).
  5. Yum installing docker might be impossible (but if I create the server without the CloudFormationInit stuff, I can SSH into the instance and then sudo yum install docker works.
  6. The server is not allowed to send cfg signals (but the raw CloudFormation template created by CDK seems to include the relevant auto-generated user data and permissions, see below):
// Excerpts from autogenerated CDK template json
"UserData": {
          "Fn::Base64": {
            "Fn::Join": [
              "",
              [
                "#!/bin/bash\n# fingerprint: 7d8f48713aedxxxx\n(\n  set +e\n  /opt/aws/bin/cfn-init -v --region ",
                {
                  "Ref": "AWS::Region"
                },
                " --stack ",
                {
                  "Ref": "AWS::StackName"
                },
                " --resource Instance5FFEF8E4e0ce835dd5aaxxxx -c default\n  /opt/aws/bin/cfn-signal -e 0 --region ",
                {
                  "Ref": "AWS::Region"
                },
                " --stack ",
                {
                  "Ref": "AWS::StackName"
                },
                " --resource Instance5FFEF8E4e0ce835dd5aaxxxx\n  cat /var/log/cfn-init.log >&2\n)"
              ]
            ]
          }
        }

// -----

"RoleDefaultPolicy5FFBxxx": {
      "Type": "AWS::IAM::Policy",
      "Properties": {
        "PolicyDocument": {
          "Statement": [
            {
              "Action": [
                "cloudformation:DescribeStackResource",
                "cloudformation:SignalResource"
              ],
              "Effect": "Allow",
              "Resource": {
                "Ref": "AWS::StackId"
              }
            }
          ],
          "Version": "2012-10-17"
        },
        "PolicyName": "RoleDefaultPolicy5FFB7xxx",
        "Roles": [
          {
            "Ref": "Role1ABCxxxx"
          }
        ]
      },
      "Metadata": {
        "aws:cdk:path": "xxx/Role/DefaultPolicy/Resource"
      }
    },


Wondering what else there is left for me to try! Any help would be greatly appreciated. I have that sinking feeling that I've overlooked something obvious...

Edit: In response to Paolo's comment, here is the full output from cdk synth with identifiers obfuscated.

Resources:
  Role1ABCXXXX:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action: sts:AssumeRole
            Effect: Allow
            Principal:
              Service: ec2.amazonaws.com
        Version: "2012-10-17"
      ManagedPolicyArns:
        - Fn::Join:
            - ""
            - - "arn:"
              - Ref: AWS::Partition
              - :iam::aws:policy/AmazonSSMManagedInstanceCore
    Metadata:
      aws:cdk:path: MyCDK/Role/Resource
  RoleDefaultPolicy5FFBXXXX:
    Type: AWS::IAM::Policy
    Properties:
      PolicyDocument:
        Statement:
          - Action:
              - cloudformation:DescribeStackResource
              - cloudformation:SignalResource
            Effect: Allow
            Resource:
              Ref: AWS::StackId
        Version: "2012-10-17"
      PolicyName: RoleDefaultPolicy5FFBXXXX
      Roles:
        - Ref: Role1ABCXXXX
    Metadata:
      aws:cdk:path: MyCDK/Role/DefaultPolicy/Resource
  InstanceInstanceSecurityGroup698618EC:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: MyCDK/Instance/InstanceSecurityGroup
      SecurityGroupEgress:
        - CidrIp: 0.0.0.0/0
          Description: Allow all outbound traffic by default
          IpProtocol: "-1"
      SecurityGroupIngress:
        - CidrIp: 0.0.0.0/0
          Description: from 0.0.0.0/0:22
          FromPort: 22
          IpProtocol: tcp
          ToPort: 22
      VpcId: vpc-07848d9441fddea14
    Metadata:
      aws:cdk:path: MyCDK/Instance/InstanceSecurityGroup/Resource
  InstanceInstanceProfile01ECXXXX:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Roles:
        - Ref: Role1ABCXXXX
    Metadata:
      aws:cdk:path: MyCDK/Instance/InstanceProfile
  Instance5FFEF8E47f468d710e75XXXX:
    Type: AWS::EC2::Instance
    Properties:
      AvailabilityZone: eu-central-1a
      IamInstanceProfile:
        Ref: InstanceInstanceProfile01ECXXXX
      ImageId:
        Ref: SsmParameterValueawsserviceamiamazonlinuxlatestamzn2amihvmx8664gp2C96584B6F00A464EAD1953AFF4B05118Parameter
      InstanceType: t3.micro
      SecurityGroupIds:
        - Fn::GetAtt:
            - InstanceInstanceSecurityGroup698618EC
            - GroupId
      SubnetId: subnet-079be82ff7754XXXX
      UserData:
        Fn::Base64:
          Fn::Join:
            - ""
            - - |-
                #!/bin/bash
                # fingerprint: 5af534616771e4af
                (
                  set +e
                  /opt/aws/bin/cfn-init -v --region 
              - Ref: AWS::Region
              - " --stack "
              - Ref: AWS::StackName
              - |-2
                 --resource Instance5FFEF8E47f468d710e75XXXX -c default
                  /opt/aws/bin/cfn-signal -e 0 --region 
              - Ref: AWS::Region
              - " --stack "
              - Ref: AWS::StackName
              - |-2
                 --resource Instance5FFEF8E47f468d710e75XXXX
                  cat /var/log/cfn-init.log >&2
                )
    DependsOn:
      - RoleDefaultPolicy5FFBXXXX
      - Role1ABCXXXX
    CreationPolicy:
      ResourceSignal:
        Count: 1
        Timeout: PT5M
    Metadata:
      aws:cdk:path: MyCDK/Instance/Resource
      AWS::CloudFormation::Init:
        configSets:
          default:
            - config
        config:
          packages:
            yum:
              docker: []
  EIP:
    Type: AWS::EC2::EIP
    Properties:
      InstanceId:
        Ref: Instance5FFEF8E47f468d710e75XXXX
    Metadata:
      aws:cdk:path: MyCDK/EIP
  CDKMetadata:
    Type: AWS::CDK::Metadata
    Properties:
      Analytics: v2:deflate64:H4sIAAAAAAAA/2VOyQ6CMBD9Fu5lFDwYz8YYTjbwAabWIY6UlnSJIU3/XcDt4OmteXklFFtYZ+LhcnntckUXiI0XsmM1OhOsRDZl50iih1gbhWzf6gW5USTHWf5YpZ0XWiK3piWFiaEsIX5c1qAMlvx4tXXXX//P+FYnfqh4Ssu+sKJHj3YWp+CH4JcX74OJ8dHfjF5tYAdFmd0dUW6D9tQj1C98AstX0JrnXXXX
    Metadata:
      aws:cdk:path: MyCDK/CDKMetadata/Default
Parameters:
  SsmParameterValueawsserviceamiamazonlinuxlatestamzn2amihvmx8664gp2C96584B6F00A464EAD1953AFF4B05118Parameter:
    Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
    Default: /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2
  BootstrapVersion:
    Type: AWS::SSM::Parameter::Value<String>
    Default: /cdk-bootstrap/hnb659fds/version
    Description: Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]
Outputs:
  EIPAddress:
    Value:
      Ref: EIP
Rules:
  CheckBootstrapVersion:
    Assertions:
      - Assert:
          Fn::Not:
            - Fn::Contains:
                - - "1"
                  - "2"
                  - "3"
                  - "4"
                  - "5"
                - Ref: BootstrapVersion
        AssertDescription: CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI.à

Edit 2: Here is the init-cloud-output.log.

Cloud-init v. 19.3-45.amzn2 running 'init-local' at Mon, 30 May 2022 10:42:35 +0000. Up 6.48 seconds.
Cloud-init v. 19.3-45.amzn2 running 'init' at Mon, 30 May 2022 10:42:37 +0000. Up 7.60 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++
ci-info: +--------+------+----------------------------+---------------+--------+-------------------+
ci-info: | Device |  Up  |          Address           |      Mask     | Scope  |     Hw-Address    |
ci-info: +--------+------+----------------------------+---------------+--------+-------------------+
ci-info: |  eth0  | True |         10.0.0.156         | 255.255.255.0 | global | 02:6c:e8:e3:39:84 |
ci-info: |  eth0  | True | fe80::6c:e8ff:fee3:3984/64 |       .       |  link  | 02:6c:e8:e3:39:84 |
ci-info: |   lo   | True |         127.0.0.1          |   255.0.0.0   |  host  |         .         |
ci-info: |   lo   | True |          ::1/128           |       .       |  host  |         .         |
ci-info: +--------+------+----------------------------+---------------+--------+-------------------+
ci-info: ++++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++++
ci-info: +-------+-----------------+----------+-----------------+-----------+-------+
ci-info: | Route |   Destination   | Gateway  |     Genmask     | Interface | Flags |
ci-info: +-------+-----------------+----------+-----------------+-----------+-------+
ci-info: |   0   |     0.0.0.0     | 10.0.0.1 |     0.0.0.0     |    eth0   |   UG  |
ci-info: |   1   |     10.0.0.0    | 0.0.0.0  |  255.255.255.0  |    eth0   |   U   |
ci-info: |   2   | 169.254.169.254 | 0.0.0.0  | 255.255.255.255 |    eth0   |   UH  |
ci-info: +-------+-----------------+----------+-----------------+-----------+-------+
ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: | Route | Destination | Gateway | Interface | Flags |
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: |   9   |  fe80::/64  |    ::   |    eth0   |   U   |
ci-info: |   11  |    local    |    ::   |    eth0   |   U   |
ci-info: |   12  |   ff00::/8  |    ::   |    eth0   |   U   |
ci-info: +-------+-------------+---------+-----------+-------+
Cloud-init v. 19.3-45.amzn2 running 'modules:config' at Mon, 30 May 2022 10:42:38 +0000. Up 9.21 seconds.
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd


 One of the configured repositories failed (Unknown),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Run the command with the repository temporarily disabled
            yum --disablerepo=<repoid> ...

     4. Disable the repository permanently, so yum won't use it by default. Yum
        will then just ignore the repository until you permanently enable it
        again or use --enablerepo for temporary usage:

            yum-config-manager --disable <repoid>
        or
            subscription-manager repos --disable=<repoid>

     5. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true

Cannot find a valid baseurl for repo: amzn2-core/2/x86_64
Could not retrieve mirrorlist https://amazonlinux-2-repos-eu-central-1.s3.dualstack.eu-central-1.amazonaws.com/2/core/latest/x86_64/mirror.list error was
12: Timeout on https://amazonlinux-2-repos-eu-central-1.s3.dualstack.eu-central-1.amazonaws.com/2/core/latest/x86_64/mirror.list: (28, 'Failed to connect to amazonlinux-2-repos-eu-central-1.s3.dualstack.eu-central-1.amazonaws.com port 443 after 2702 ms: Connection timed out')
May 30 10:42:58 cloud-init[2199]: util.py[WARNING]: Package upgrade failed
May 30 10:42:58 cloud-init[2199]: cc_package_update_upgrade_install.py[WARNING]: 1 failed with exceptions, re-raising the last one
May 30 10:42:58 cloud-init[2199]: util.py[WARNING]: Running module package-update-upgrade-install (<module 'cloudinit.config.cc_package_update_upgrade_install' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_package_update_upgrade_install.pyc'>) failed
Cloud-init v. 19.3-45.amzn2 running 'modules:final' at Mon, 30 May 2022 10:42:59 +0000. Up 29.98 seconds.
Unknown error retrieving Instance5FFEF8E4e0ce835dd5aaXXXX
ValidationError: Stack arn:aws:cloudformation:eu-central-1:ACCOUNT_ID:stack/MyCDK/d1772460-e004-11ec-b341-29280531XXXX is in CREATE_FAILED state and cannot be signaled
2022-05-30 10:43:00,475 [DEBUG] CloudFormation client initialized with endpoint https://cloudformation.eu-central-1.amazonaws.com
2022-05-30 10:43:00,476 [DEBUG] Describing resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK
2022-05-30 10:44:00,476 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:44:00,476 [ERROR] Client-side timeout
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
    return f(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
    "Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:44:00,478 [DEBUG] Sleeping for 0.648091 seconds before retrying
2022-05-30 10:44:01,128 [DEBUG] Describing resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK
2022-05-30 10:45:01,128 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:45:01,128 [ERROR] Client-side timeout
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
    return f(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
    "Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:45:01,129 [DEBUG] Sleeping for 2.585657 seconds before retrying
2022-05-30 10:45:03,717 [DEBUG] Describing resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK
2022-05-30 10:46:03,717 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:46:03,718 [ERROR] Client-side timeout
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
    return f(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
    "Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:46:03,718 [DEBUG] Sleeping for 4.082728 seconds before retrying
2022-05-30 10:46:07,805 [DEBUG] Describing resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK
2022-05-30 10:47:07,805 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:47:07,806 [ERROR] Client-side timeout
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
    return f(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
    "Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:47:07,806 [DEBUG] Sleeping for 11.379097 seconds before retrying
2022-05-30 10:47:19,197 [DEBUG] Describing resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK
2022-05-30 10:48:19,197 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:48:19,197 [ERROR] Client-side timeout
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
    return f(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
    "Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:48:19,521 [DEBUG] CloudFormation client initialized with endpoint https://cloudformation.eu-central-1.amazonaws.com
2022-05-30 10:48:19,523 [DEBUG] Signaling resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK with unique ID i-0b3eb81ec6a111218 and status SUCCESS
2022-05-30 10:49:19,524 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:49:19,524 [ERROR] Client-side timeout
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
    return f(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
    "Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:49:19,525 [DEBUG] Sleeping for 0.292454 seconds before retrying
2022-05-30 10:49:19,818 [DEBUG] Signaling resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK with unique ID i-0b3eb81ec6a111218 and status SUCCESS
2022-05-30 10:50:19,818 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:50:19,818 [ERROR] Client-side timeout
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
    return f(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
    "Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:50:19,819 [DEBUG] Sleeping for 1.337550 seconds before retrying
2022-05-30 10:50:21,158 [DEBUG] Signaling resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK with unique ID i-0b3eb81ec6a111218 and status SUCCESS
2022-05-30 10:51:21,158 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:51:21,158 [ERROR] Client-side timeout
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
    return f(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
    "Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:51:21,159 [DEBUG] Sleeping for 6.997329 seconds before retrying
2022-05-30 10:51:28,163 [DEBUG] Signaling resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK with unique ID i-0b3eb81ec6a111218 and status SUCCESS
2022-05-30 10:52:28,164 [WARNING] Timeout of 60 seconds breached
2022-05-30 10:52:28,164 [ERROR] Client-side timeout
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 189, in _retry
    return f(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 263, in _timeout
    "Execution did not succeed after %s seconds" % duration)
cfnbootstrap.util.TimeoutError
2022-05-30 10:52:28,164 [DEBUG] Sleeping for 5.279977 seconds before retrying
2022-05-30 10:52:33,450 [DEBUG] Signaling resource Instance5FFEF8E4e0ce835dd5aaXXXX in stack MyCDK with unique ID i-0b3eb81ec6a111218 and status SUCCESS
ci-info: no authorized ssh keys fingerprints found for user ec2-user.
Cloud-init v. 19.3-45.amzn2 finished at Mon, 30 May 2022 10:52:33 +0000. Datasource DataSourceEc2.  Up 604.40 seconds

Solution

  • The problem was that the instance didn't have internet access (despite being on a public subnet).

    The reason for this was that the VPC is not our default VPC, and therefore the public subnet we created did not have Auto-assign public IPv4 address enabled. Enabling this setting fixed the problem.

    Phew!