amazon-web-services amazon-ec2 terraform aws-application-load-balancer

How to Manage Terraform State Across Multiple Workspaces and Environments Without Causing Downtime in AWS?

I've been working on a complex Terraform setup where:

We have multiple environments (e.g., dev, staging, and prod), each using different workspaces.

Each workspace has its own state file and backend configuration (S3 + DynamoDB lock for prod, and local state for dev).
We use several shared modules across these environments, such as VPCs, EC2 instances, RDS databases, and security groups, with configurations varying between environments.
For production (prod), I need to ensure zero downtime when updating critical resources like ALBs, EC2 instances, and RDS.
We're running into issues with state consistency when changes are applied across environments. For example, when applying changes in dev, resources tied to shared modules inadvertently affect prod.
Additionally, we need to ensure that changes in prod are staged in such a way that they are rolled out with blue-green deployment patterns (e.g., for EC2 instances) while still using the same Terraform modules.

Key challenges:

How to safely and consistently handle Terraform state across environments with multiple workspaces and varying backend configurations?
What’s the best way to manage shared resources (e.g., VPC, RDS) between environments without risk of one environment's state interfering with another's?
How to implement a blue-green deployment pattern using Terraform for critical services like ALB and EC2, while ensuring that no downtime occurs during the cutover?
I've explored options like using terraform import for shared resources, but I'm concerned about the maintainability and safety of this approach.
How can we ensure that each environment is isolated yet still shares common modules, and how should state be handled in this scenario?

Solution

To give an answer to these questions:

There are multiple ways in which you can set boundaries, however I would not recommend terraform workspaces unless you know what you are doing. Try to set boundaries in AWS. A good recommendation is to separate dev, acc and prd environments in different accounts. If you set permissions correctly, it provides a natural safe boundary for your deployments.
You could create a separate environment called shared and give the other environments read-only access to the state file. So that those environments can read the outputs with a terraform_remote_state data sources but not change the state.
There is an excellent tutorial about this specific topic on the Hashicorp website.
Check out bullet number two for my preferred way of sharing resources between environments.
Each environment can be isolated on the AWS Account level as I recommended under the first bullet. The is one of the easiest methods of isolation resources.
- You'll do an terraform init with a different backend configuration for each environment. Each environment stores the state in a different s3 bucket and different account, so that you get full isolation.
- For terraform plan and apply commands you'll supply a different tfvars file for each environment. Meaning that the configuration of each environment is separated.
- You will then have a common set of terraform code that is the same for all environments. The only thing that actually differs is the input variables in tfvars.
- You can make minor deviations between environments with count blocks. In a production environment you might want to have more EC2s than in a development environment for example. But be careful, because deviations lead to a less predictable environment.