Search code examples
amazon-web-servicesplayframeworkportupdatesamazon-ecr

Rolling deployment of Play Framework webapp on AWS ECS


I want to deploy a Play web application on AWS ECS. I have created a cluster of 2 EC2 instances running the web service. Each instance is running a single task. The cluster is load balanced by an AWS ELB.

Upon pushing the new docker image to the repository I create a new revision of my task definition which includes the new image tagged with latest. When I update the service to use the new task definition then all of the EC2 instances are updating their task immediately. Even though I have 2 EC2 instances I experience a downtime because AWS updates all instances at the same time, instead of updating one instance after the other in a rolling deployment kind of way.

I tried to launch more than one task on each EC2 instance, but this is obviously not possible since each task requires Play's default port 9000 and I get a "port already in use" error message in the events tab of the service.

I can think of two possible solutions:

  • Each task instance uses a dynamic port instead of the default port. (How can I configure a load balancer to "search" for the dynamic ports?)
  • Create two separate ECR clusters, have them both in a single load balancer target group and manually change the task definition of the second cluster after the first has finished updating. (How can this be automated - or can it be automated at all?)

Is one of these solutions the way to go or are there any other solutions or best practices for this scenario?


Solution

  • Okay, I found the sources of the problem. First of all I mixed up the terminology a bit. The docs for AWS Elastic Beanstalk explain the difference between a rolling update and a rolling deployment. I was trying to achieve a rolling deployment, not a rolling update - I will edit my question accordingly.

    In addition I noticed that I changed the "Minimum healthy percent" of instances and inadvertently saved the settings at 0%. Therefore ECS correctly took the liberty and redeployed more than one EC2 instance at the same time, leaving no healthy instances on my cluster. I found a very helpful video here where the author goes through the process of upgrading the ECS cluster and then downgrading it again. Everything works as expected now.