Search code examples
amazon-web-servicesterraformclouddevopsaws-auto-scaling

Terraform wait until execution of user_data then make an image to autoscale?


I am making a scalable service using AWS EC2, AutoScale Groups, SQS and CloudWatch Alarm using Terraform with user_data to set things up. I am using a aws_launch_template and scaling using autoscale.

It works, but for every new instances it runs user_data again and that takes a lot of time. The only workaround to this is making machine image from my instance. But terraform does not know if the user_data has executed or not.

I am thinking of creating an instance, then making an image from it using terraform then use that as template to autoscale. Found this post but not so clear.

I am trying this code -

resource "aws_ssm_document" "cloud_init_wait" {
  name = "cloud-init-wait"
  document_type = "Command"
  document_format = "YAML"
  content = <<-DOC
    schemaVersion: '2.2'
    description: Wait for cloud init to finish
    mainSteps:
    - action: aws:runShellScript
      name: StopOnLinux
      precondition:
        StringEquals:
        - platformType
        - Linux
      inputs:
        runCommand:
        - cloud-init status --wait
    DOC
}


resource "aws_instance" "example" {
  ami           = var.instance_ami
  instance_type = "t2.micro"
  key_name              = "${var.ssh_key}"

  provisioner "local-exec" {
    interpreter = ["/bin/bash", "-c"]

    command = <<-EOF
    set -Ee -o pipefail
    export AWS_DEFAULT_REGION=${var.aws_region}

    command_id=$(aws ssm send-command --document-name ${aws_ssm_document.cloud_init_wait.arn} --instance-ids ${self.id} --output text --query "Command.CommandId")
    if ! aws ssm wait command-executed --command-id $command_id --instance-id ${self.id}; then
      echo "Failed to start services on instance ${self.id}!";
      echo "stdout:";
      aws ssm get-command-invocation --command-id $command_id --instance-id ${self.id} --query StandardOutputContent;
      echo "stderr:";
      aws ssm get-command-invocation --command-id $command_id --instance-id ${self.id} --query StandardErrorContent;
      exit 1;
    fi;
    echo "Services started successfully on the new instance with id ${self.id}!"

    EOF
  }
}

I am having this error:

 exit status 254. Output: 
│ An error occurred (InvalidInstanceId) when calling the SendCommand operation: Instances [[i-05b9f087e8d7dd7xx]] not in a
│ valid state for account 669201380121

Any idea on how we can make terraform wait until user data ran then make an image to autoscale?


Solution

  • Manual Image Creation (Old answer)

    scroll down for automated solution with Packer

    It might be possible but it seemed harder. So I first

    1. made an instance from the console
    2. then installed some dependencies
    3. After that made an image
    4. used the image id to create an ec2 instance launch_template
    5. use user_data to provision source code to process (depending on your need).

    Now whenever my autoscale group creates instances I have my dependencies installed. It's much faster now.

    locals {
      provision_config= <<-END
        #cloud-config
        ${jsonencode({
          write_files = [
            {
              path        = "/root/src/main.py"
              permissions = "0644"
              encoding    = "b64"
              content     = filebase64("../src/main.py")
            },
          ]
        })}
      END
    }
    
    data "cloudinit_config" "config" {
      gzip                = false
      base64_encode       = true
    
      part {
        content_type      = "text/cloud-config"
        filename          = "cloud-config_provision.yaml"
        content           = local.provision_config
        }
    
      part {
        content_type      = "text/x-shellscript"
        filename          = "run_src.sh"
        content           = <<-EOF
          #!/bin/bash
          cd /root
          mkdir tmp
          mkdir tmp/origin
          mkdir tmp/converted
          mkdir tmp/packaged
          
          pip3 install boto3
          pip3 install ec2-metadata
          cd src
    
          python3 main.py
    
        EOF
      }
    }
    
    resource "aws_iam_instance_profile" "instance_profile" {
      name                  = "${local.resource_component}-profile"
      role                  = "${aws_iam_role.ec2_role.name}"
    }
    
    resource "aws_launch_template" "machine_template" {
      name                  = "${local.resource_component}-template" 
      image_id              = "${var.instance_ami}"
      instance_type         = "${var.instance_type}"
      key_name              = "${var.ssh_key}"
      user_data             = "${data.cloudinit_config.config.rendered}"
    
      iam_instance_profile {
        name                = "${aws_iam_instance_profile.instance_profile.name}"
      }
    
      tag_specifications {
        resource_type       = "instance"
        tags = {
          Name              = "${local.resource_component}-child"
          Source            = "Autoscaling"
        } 
      }
    
      monitoring {
        enabled             = true
      }
    
      instance_initiated_shutdown_behavior = "terminate"
    }
    

    Automated Image Creation with Packer by HCL (Edited Answer)

    I am making AMI before terraform plan, and automated the system using a shell script, you can find my solution here.
    I found this as the best of all the ways, with autoscale it's even faster since I don't have to provision things again and again. There's also limitation for user_data which this solution solves easily.