Search code examples
amazon-web-servicesamazon-ecsaws-fargateaws-service-connectecs-service-connect

AWS ECS Fargate - Service Connect not updating /etc/hosts


I have two services running on ECS: service_1 and service_2. Service_2 uses service_1. Both services are on the same namespace.

Service Connect service configuration on service_1 sets DNSName and port in the client alias section. Let's say DNSName is set to service1.

Deployment is done using Terraform.

When deploying both services in parallel, /etc/hosts on service_1 container will have entry similar to 127.255.0.1 service1, and so it can discover itself. However on service_2, there is no such entry in /etc/hosts created, so it is not able to find service_1 by hostname.

"Solution" is to force redeploy service_2. Once redeployed, /etc/hosts on service_2 will have new entry, pointing to service1.

My guess is that AWS Service Connect does not autorefresh /etc/hosts on already running services. Which in a way defeats the purpose of this service.

Since I only have one way dependency, I can utilise depends_on in Terraform, so that service_1 is up and running, before service_2 is deployed. But what if I had bi-directional dependency? Do I need to create both, then force-recreate one so that they can find each other?

EDIT

Relevant Terraform code snippets:

resource "aws_service_discovery_private_dns_namespace" "this" {
  name = "aws.internal"
  <...>
}

module "ecs_cluster" {
  source = "terraform-aws-modules/ecs/aws//modules/cluster"

  cluster_service_connect_defaults = {
    namespace = aws_service_discovery_private_dns_namespace.this.arn
  }
  <...>
}

module "service1" {
  source = "terraform-aws-modules/ecs/aws//modules/service"

  cluster_arn = module.ecs_cluster.arn

  <...>

  container_definitions = {
    <...>
    service1 = {
      <...>
      port_mappings = [
        {
          name          = "service1"
          containerPort = 8888
          hostPort      = 8888
          protocol      = "tcp"
        }
      ]
    }
  }

  service_connect_configuration = {
    enabled = true
    service = {
      client_alias = {
        port     = 8888
        dns_name = "service1"
      }
    port_name      = "service1"
    discovery_name = "service1"
  }
}

Solution

  • Unfortunately this is an expected behaviour of ECS Service Connect.

    Existing services must be redeployed before the applications in them can resolve new endpoints. New endpoints that are added to the namespace after the most recent deployment won't be added to the task configuration. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-connect.html


    I can utilise depends_on in Terraform, so that service_1 is up and running, before service_2 is deployed.

    This is what AWS recommends as well as described in the Service Connect Concept: Deployment Order

    First, assume that you are creating an application that is available to the public internet in a single AWS CloudFormation template and single AWS CloudFormation stack.

    The public discovery and reachability should be created last by AWS CloudFormation, including the frontend client service. The services need to be created in this order to prevent an time period when the frontend client service is running and available the public, but a backend isn't. This eliminates error messages from being sent to the public during that time period.

    In AWS CloudFormation, you must use the dependsOn to indicate to AWS CloudFormation that multiple Amazon ECS services can't be made in parallel or simultaneously. You should add the dependsOn to the frontend client service for each backend client-server service that the client tasks connect to.


    What you can try is to set both services as Client and server in Service Connect configuration and access them from each other using their DNS. E.g.: service1.namespace and service2.namespace.

    Client mode connects to other services in the namespace, and client-server mode provides endpoints for this service.

    enter image description here

    If you don't provide a DNS name, the Port alias is used in the Service Connect proxy configuration.


    Edit

    Your Namespace's Instance discovery should be set to API calls and DNS queries in VPCs

    enter image description here