terraform virtual-machine terraform-provider-aws terraform-provider-gcp iaas

How to perform generalization of compute resource abstraction in Terraform?

I've been researching Terraform's capabilities, and I've encountered differing opinions on whether it is truly cloud-agnostic. While Terraform itself doesn't claim "write once, run everywhere," it does support multi-cloud compatibility, which I consider a form of cloud-agnosticism. I understand that I need to define configurations declaratively for each cloud provider, which is not an issue for me as long as Terraform supports abstraction through high-level configuration variables.

For example, I want to configure a VM with the following properties for both GCP and AWS:

CPU: 2
RAM: 3 GB
Disk: 20 GB

These properties are standard for VMs across different cloud providers. However, I understand that specific VM types (e.g., GCE and EC2 instances) may not have exact matches for these specifications. For instance, GCE does not offer a VM with exactly 2 CPUs and 3 GB of RAM, but AWS EC2 might.

My goal is to define these general variables and have Terraform automatically select the closest available VM type for each provider. Is there a way to write Terraform code that abstracts these high-level configurations and maps them to the appropriate VM types for GCP and AWS? Ideally, I want to avoid hardcoding the mapping between these configurations and the specific instance types for each provider. Is there a plugin or method in Terraform that supports this kind of abstraction?

Additionally, could you provide a simple example demonstrating how to achieve this abstraction for both GCP and AWS, based on the specified CPU, RAM, and Disk requirements?

Solution

An important requirement for your goal is having access to all of the available instance types for the target platform so that you can select the one that fits best. Terraform itself does not have that information, so any solution for this framing of the problem is something you would need to build for yourself outside of Terraform.

In discussions with others who use Terraform a common compromise I've seen is for an organization that uses Terraform to define their own set of cloud-agnostic instance types that each has a particular set of characteristics, and then manually maintain a lookup table from the organization's own instance types to each platform's real instance types.

To apply that to your situation, you'd need to decide on a reasonable symbolic name for "2 CPUs, 3 GB RAM, 20 GB disk" that makes sense to your team -- for the sake of this example I'll call it 2c-3r-20d -- and then decide for each target platform you intend to use which of their instance types is the best match, declaring that lookup table as a map in Terraform.

locals {
  aws_instance_types = tomap({
    "2c-3r-20d" = "m7i.large"
    # (and any others you want to define)
  })
}

locals {
  gcp_instance_types = tomap({
    "2c-3r-20d" = "n4-standard-2"
    # (and any others you want to define)
  })
}

You can then encourage those in your organization to describe their needs in terms of your custom instance type names instead of the platform-specific ones, and use Terraform configuration you write to translate to the platform-specific instance type you manually identified as the closest match.

(Note that both EC2 and GCP consider storage size as something mostly independent of the instance type, since they to consider disks as a separate object that is attached to the VM rather than being a part of the VM. You could still choose to bundle disk size into your own instance types if you want, but in that case you'd need to define a more complex lookup table for each platform that specifies both the instance type and the settings for the root storage volume.)