Search code examples
amazon-ec2mesosterraformdcos

Is it possible to ask Terraform to destroy AWS nodes with known IPs


We use Terraform to create and destroy Mesos DC/OS cluster on AWS EC2. Number of agent nodes is defined in a variable.tf file:

variable "instance_counts" {
  type = "map"
  default = {   
    master       = 1
    public_agent = 2 
    agent        = 5 
  }
}

Once the cluster is up, you can add or remove agent nodes by changing the number of agent in that file and apply again. Terraform is smart enough to recognize the difference and act accordingly. When it destroy nodes, it tends to go for the highest numbered nodes. For example, if I have a 8-node dcos cluster and want to terminate 2 of the agents, Terraform would take down dcos_agent_node-6 and dcos_agent_node-7.

What if I want to destroy an agent with a particular IP? Terraform must be aware of the IPs because it knows the order of the instances. How do I hack Terraform to remove agents by providing the IPs?


Solution

  • I think you're misunderstanding how Terraform works.

    Terraform takes your configuration and builds out a dependency graph of how to create the resources described in the configuration. If it has a state file it then overlays information from the provider (such as AWS) to see what is already created and managed by Terraform and removes that from the plan and potentially creates destroy plans for resources that exist in the provider and state file.

    So if you have a configuration with a 6 node cluster and a fresh field (no state file, nothing built by Terraform in AWS) then Terraform will create 6 nodes. If you then set it to have 8 nodes then Terraform will attempt to build a plan containing 8 nodes, realises it already has 6 and then creates a plan to add the 2 missing nodes. When you then change your configuration back to 6 nodes Terraform will build a plan with 6 nodes, realise you have 8 nodes and create a destroy plan for nodes 7 and 8.

    To try and get it to do anything different to that would involve some horrible hacking of the state file so that it thinks that nodes 7 and 8 are different to the ones most recently added by Terraform.

    As an example your state file might look something like this:

    {
        "version": 3,
        "terraform_version": "0.8.1",
        "serial": 1,
        "lineage": "7b565ca6-689a-4aab-a3ec-a1ed77e83678",
        "modules": [
            {
                "path": [
                    "root"
                ],
                "outputs": {},
                "resources": {
                    "aws_instance.test.0": {
                        "type": "aws_instance",
                        "depends_on": [],
                        "primary": {
                            "id": "i-01ee444f57aa32b8e",
                            "attributes": {
                                ...
                            },
                            "meta": {
                                "schema_version": "1"
                            },
                            "tainted": false
                        },
                        "deposed": [],
                        "provider": ""
                    },
                    "aws_instance.test.1": {
                        "type": "aws_instance",
                        "depends_on": [],
                        "primary": {
                            "id": "i-07c1999f1109a9ce2",
                            "attributes": {
                                ...
                            },
                            "meta": {
                                "schema_version": "1"
                            },
                            "tainted": false
                        },
                        "deposed": [],
                        "provider": ""
                    }
                },
                "depends_on": []
            }
        ]
    }
    

    If I wanted to go back to a single instance instead of 2 then Terraform would attempt to remove the i-07c1999f1109a9ce2 instance as the configuration is telling it that aws_instance.test.0 should exist but not aws_instance.test.1. To get it to remove i-01ee444f57aa32b8e instead then I could edit my state file to flip the two around and then Terraform would think that that instance should be removed instead.

    However, you're getting into very difficult territory as soon as you start doing things like that and hacking the state file. While it's something you can do (and occasionally may need to) you should seriously consider how you are working if this is anything other than a one off case for a special reason (such as moving raw resources into modules - now made easier with Terraform's state mv command).

    In your case I'd question why you need to remove two specific nodes in a Mesos cluster rather than just specifying the size of the Mesos cluster. If it's a case of a specific node being bad then I'd always terminate it and allow Terraform to build me a fresh, healthy one anyway.