Search code examples
amazon-ec2aws-opsworksrhel7

Instances in custom layers always get start_failed status


No opsworks logs are created for the instances, so I don't have a ton of debug information but I will try to be as descriptive as possible. Any hints or ideas are much appreciated.

I have a bunch of custom layers, some are service layers, some are mongodb and one is a customer memcached layer.

I have attempted launching one instance in each layer, on both RHEL7 and Amazon Linux(2016.03) instances(both latest versions with the latest opsworks agent version 3436 ) and chef 11.10.

When the mongodb layers have instances that do not overlap with the service layers, they fail with the status start_failed everytime, on both operating systems 100% of the time.

When I create instances that are shared by both a mongodb layer and a service layer, the instance moves into the setup stage and then through the rest of the process everytime(barring some chef code by on my part).

From EC2 the instance is launched and online and all the status checks are working. I have looked in the instance system logs from the ec2 dashboard and there aren't any system level errors happening. I cannot ssh into the instance to further investigate since the IAM users never load.

All of the instances get the same custom recipes, and then execution of whether to proceed with execution on that instance is determined in run-time whether to skip if layer and deployment does not align, so I don't believe this to be a recipe discrepancy.

My best guess is that this could be an agent related issue, but that is nothing more than a gut feeling at this point?

Has anyone else had a similar problem or can even just point me in the right direction?

Update

I figured out how to ssh into the instance. It had a private ip, but not a public ip, so I had to do it from another opsworks instance. Anyway, I found the following error in /var/log/aws/opsworks/user-data.log:

/tmp/opsworks-agent-installer/opsworks-agent/lib/bootstrap/utils.rb:111:in `block (2 levels) in execute': Failed to execute "yum --assumeyes update" pid 9536 exit 1: Loaded plugins: amazon-id, rhui-lb, search-disabled-repos (RuntimeError)


Could not contact any CDS load balancers: rhui2-cds01.us-east-1.aws.ce.redhat.com, rhui2-cds02.us-east-1.aws.ce.redhat.com.
Could not contact CDS load balancer rhui2-cds01.us-east-1.aws.ce.redhat.com, trying others.
    from /tmp/opsworks-agent-installer/opsworks-agent/lib/bootstrap/utils.rb:99:in `loop'
    from /tmp/opsworks-agent-installer/opsworks-agent/lib/bootstrap/utils.rb:99:in `block in execute'
    from /tmp/opsworks-agent-installer/opsworks-agent/lib/bootstrap/utils.rb:98:in `chdir'
    from /tmp/opsworks-agent-installer/opsworks-agent/lib/bootstrap/utils.rb:98:in `execute'
    from /tmp/opsworks-agent-installer/opsworks-agent/lib/bootstrap/utils.rb:14:in `yum'
    from /tmp/opsworks-agent-installer/opsworks-agent/lib/bootstrap/instance_agent_installer.rb:57:in `install_system_updates'
    from /tmp/opsworks-agent-installer/opsworks-agent/lib/bootstrap/instance_agent_installer.rb:25:in `block in run'
    from /tmp/opsworks-agent-installer/opsworks-agent/lib/bootstrap/log.rb:96:in `measure'
    from /tmp/opsworks-agent-installer/opsworks-agent/lib/bootstrap/instance_agent_installer.rb:25:in `run'
    from /tmp/opsworks-agent-installer/opsworks-agent/lib/bootstrap/instance_agent_installer.rb:10:in `run'
    from /tmp/opsworks-agent-installer/opsworks-agent/bin/opsworks-agent-installer.rb:8:in `<main>'

Solution

  • The custom database layers public ip address option was turned off. In order to communicate with OpsWorks from the VPC to install the cookbooks and then install package either a public ip addresses or use a special NAT instance is required.

    Public IP addresses can be turned on in the Opsworks -> Layers -> Network section.

    Also, here is the AWS NAT Instances Documentation.