Search code examples
amazon-web-servicesterraformautoscalingamazon-ecsblue-green-deployment

How to implement blue/green deployments in AWS with Terraform without losing capacity


I have seen multiple articles discussing blue/green deployments and they consistently involve forcing recreation of the Launch Configuration and the Autoscaling Group. For example:

https://groups.google.com/forum/#!msg/terraform-tool/7Gdhv1OAc80/iNQ93riiLwAJ

This works great in general except that the desired capacity of the ASG gets reset to the default. So if my cluster is under load then there will be a sudden drop in capacity.

My question is this: is there a way to execute a Terraform blue/green deployment without a loss of capacity?


Solution

  • I don't have a full terraform-only solution to this.

    The approach I have is to run a small script to get the current desired capacity, set a variable, and then use that variable in the asg.

    handle-desired-capacity:
        @echo "Handling current desired capacity"
        @echo "---------------------------------"
        @if [ "$(env)" == "" ]; then \
            echo "Cannot continue without an environment"; \
            exit -1; \
        fi
        $(eval DESIRED_CAPACITY := $(shell aws autoscaling describe-auto-scaling-groups --profile $(env) | jq -SMc '.AutoScalingGroups[] | select((.Tags[]|select(.Key=="Name")|.Value) | match("prod-asg-app")).DesiredCapacity'))
        @if [ "$(DESIRED_CAPACITY)" == '' ]; then \
            echo Could not determine desired capacity.; \
            exit -1; \
        fi
        @if [ "$(DESIRED_CAPACITY)" -lt 2 -o "$(DESIRED_CAPACITY)" -gt 10 ]; then \
            echo Can only deploy between 2 and 10 instances.; \
            exit -1; \
        fi
        @echo "Desired Capacity is $(DESIRED_CAPACITY)"
        @sed -i.bak 's!desired_capacity = [0-9]*!desired_capacity = $(DESIRED_CAPACITY)!g' $(env)/terraform.tfvars
        @rm -f $(env)/terraform.tfvars.bak
        @echo ""
    

    Clearly, this is as ugly as it gets, but it does the job.

    I am looking to see if we can get the name of the ASG as an output from the remote state that I can then use on the next run to get the desired capacity, but I'm struggling to understand this enough to make it useful.