Search code examples
amazon-web-servicesamazon-ec2amazon-amipacker

Packer: Unable to build AMI with AWS Session Manager


I am trying to build a new GitHub runner AMI using an existing Packer template that last built an image a little over a year ago. The template is using the Amazon EBS builder with AWS Session Manager (SSM) for its SSH connection:

source "amazon-ebs" "ubuntu" {
  ami_name         = "github-runner-ubuntu-jammy-amd64-${local.version_s}"
  instance_type    = "t3.medium"
  region           = "us-east-2"
  source_ami       = data.amazon-ami.ubuntu-2204.id
  ssh_username     = "ubuntu"
  ssh_interface    = "session_manager"
  pause_before_ssm = "3m"
  aws_polling {
    delay_seconds = 30
    max_attempts  = 50
  }
  temporary_iam_instance_profile_policy_document {
    Version = "2012-10-17"
    Statement {
      Effect = "Allow"
      Action = [
        "ssm:DescribeAssociation",
        "ssm:GetDeployablePatchSnapshotForInstance",
        "ssm:GetDocument",
        "ssm:DescribeDocument",
        "ssm:GetManifest",
        "ssm:GetParameter",
        "ssm:GetParameters",
        "ssm:ListAssociations",
        "ssm:ListInstanceAssociations",
        "ssm:PutInventory",
        "ssm:PutComplianceItems",
        "ssm:PutConfigurePackageResult",
        "ssm:UpdateAssociationStatus",
        "ssm:UpdateInstanceAssociationStatus",
        "ssm:UpdateInstanceInformation"
      ]
      Resource = ["*"]
    }
    Statement {
      Effect = "Allow"
      Action = [
        "ssmmessages:CreateControlChannel",
        "ssmmessages:CreateDataChannel",
        "ssmmessages:OpenControlChannel",
        "ssmmessages:OpenDataChannel"
      ]
      Resource = ["*"]
    }
    Statement {
      Effect = "Allow"
      Action = [
        "ec2messages:AcknowledgeMessage",
        "ec2messages:DeleteMessage",
        "ec2messages:FailMessage",
        "ec2messages:GetEndpoint",
        "ec2messages:GetMessages",
        "ec2messages:SendReply"
      ]
      Resource = ["*"]
    }
  }
  skip_create_ami = false
  deprecate_at    = local.deprecate_s
  tags = merge(
    var.global_tags,
    var.ami_tags,
    {
      OS_Version    = "ubuntu-jammy"
      Release       = "Latest"
      Base_AMI_Name = "{{ .SourceAMIName }}"
  })
  snapshot_tags = merge(
    var.global_tags,
    var.snapshot_tags,
  )

  launch_block_device_mappings {
    device_name           = "/dev/sda1"
    volume_size           = "15"
    volume_type           = "gp3"
    delete_on_termination = "true"
  }
}

The source AMI used is grabbing the latest version of ubuntu-jammy-22.04-amd64 (at this time, it's: ubuntu-jammy-22.04-amd64-server-20240801). The temporary IAM instance profile policy document is following the AmazonSSMManagedEC2InstanceDefaultPolicy managed policy. The Session Manager also currently has CloudWatch logging, S3 logging, and KMS encryption all disabled. This is using Packer version 1.11.2 and AWS Session Manager plugin version 1.2.650.0.

When running the packer build command: PACKER_LOG=1 packer build -debug -only *.amazon-ebs.* -var=ami_environment=test packer/ubuntu/packer.pkr.hcl, I am getting connection issues. This is what I am seeing in the logs:

==> github-runner-ubuntu.amazon-ebs.ubuntu: Attaching policy to the temporary role: packer-66bcc822-537d-ca74-a3aa-08fa262d2023
==> github-runner-ubuntu.amazon-ebs.ubuntu: Launching a source AWS instance...
    github-runner-ubuntu.amazon-ebs.ubuntu: Instance ID: i-0a358f5e6c0259ef2
==> github-runner-ubuntu.amazon-ebs.ubuntu: Waiting for instance (i-0a358f5e6c0259ef2) to become ready...
2024/08/14 11:25:06 packer-plugin-amazon_v1.3.2_x5.0_linux_amd64 plugin: 2024/08/14 11:25:06 [INFO] Not using winrm communicator, skipping get password...
==> github-runner-ubuntu.amazon-ebs.ubuntu: Waiting 3m0s before establishing the SSM session...
2024/08/14 11:28:07 packer-plugin-amazon_v1.3.2_x5.0_linux_amd64 plugin: 2024/08/14 11:28:07 Found available port: 8494 on IP: 0.0.0.0
2024/08/14 11:28:07 packer-plugin-amazon_v1.3.2_x5.0_linux_amd64 plugin: 2024/08/14 11:28:07 ssm: Starting PortForwarding session to instance i-0a358f5e6c0259ef2
==> github-runner-ubuntu.amazon-ebs.ubuntu: Using SSH communicator to connect: localhost
2024/08/14 11:29:16 packer-plugin-amazon_v1.3.2_x5.0_linux_amd64 plugin: 2024/08/14 11:29:16 [INFO] Waiting for SSH, up to timeout: 5m0s    
==> github-runner-ubuntu.amazon-ebs.ubuntu: Waiting for SSH to become available...
2024/08/14 11:29:16 packer-plugin-amazon_v1.3.2_x5.0_linux_amd64 plugin: 2024/08/14 11:29:16 [DEBUG] TCP connection to SSH ip/port failed: dial tcp 127.0.0.1:8494: connect: connection refused
... (many more connection refused errors)
2024/08/14 11:34:12 packer-plugin-amazon_v1.3.2_x5.0_linux_amd64 plugin: 2024/08/14 11:34:12 [DEBUG] TCP connection to SSH ip/port failed: dial tcp 127.0.0.1:8494: connect: connection refused
2024/08/14 11:34:16 packer-plugin-amazon_v1.3.2_x5.0_linux_amd64 plugin: 2024/08/14 11:34:16 [DEBUG] SSH wait cancelled. Exiting loop.
==> github-runner-ubuntu.amazon-ebs.ubuntu: Timeout waiting for SSH

The packer build is ran using a GitHub action, but I am also able to reproduce the issue when running the command from my local machine using my AWS credentials. When debugging, I am able to see the EC2 instance get created in the console and was able to confirm that the IAM role attached is using the temporary policies from the template. The security group has no inbound rules set and has outbound open to all traffic. From what I have read in the documentation, I shouldn't need to include any inbound rules when using SSM. I also cannot connect to the instance from the AWS console using Session Manager. I am seeing this error below:

SSM Agent is not online

I have tried increasing the pause_before_ssm and even tried updating the temporary policy to allow full access to everything, but I am still seeing the same error.

Could there be some VPC/networking changes preventing Packer from connecting to the instance? Or does the instance need inbound rules after all? Or is there some other configuration I am overlooking?


Solution

  • I run a build very similar to what you described. When I do this I assign a role/instance profile to the instance which has these 3 AWS-managed policies:

    • AmazonSSMDirectoryServiceAccess
    • AmazonSSMManagedInstanceCore
    • CloudWatchAgentServerPolicy

    I am running in a VPC with 4 subnets, 2 public and 2 private, with a NAT gateway and internet gateway. I have specified one of the public subnets in my Packer script.

    I tried many things to make it work via SSM, but ultimately I got it to work using this:

    associate_public_ip_address = true
    

    The security group assigned to the instance allows inbound port 443, 80, and 22 traffic from all the proxy server IP ranges for my company, though I don't know if all of these ports are required for building the image (it is a security group used for other purposes too). This allows GitHub Actions runners inside my company to communicate since they go through a proxy to reach the internet (to reach this AWS server). The security group allows all outbound traffic.

    These are some of the other settings I am using:

      ssh_username = "ubuntu"
      vpc_id = var.vpc_id
      subnet_id = var.subnet_id
      security_group_id = var.security_group_id
      iam_instance_profile = var.iam_instance_profile
      associate_public_ip_address = true
      ssh_interface = "session_manager"
    

    I recommend you make an EC2 instance and specify the instance role/profile and security groups and tinker with those until you can connect via Session Manager. Once you have that working, then use the same role/profile and security group for your Packer script after that to rule out some variables.