Search code examples
amazon-web-servicesweb-servicesdnsamazon-ecsenvoyproxy

Configuring Envoy to use SRV records generated by AWS ECS and Route53


I'm using AWS ECS to deploy multiple web services (via Docker images) that are behind an Envoy front proxy. Some of these docker images have multiple deployed instances.

I'm currently using the service discovery features of ECS to generate DNS records so my services are discoverable. All of this works as expected.

I was initially using the awsvpc network mode and was using A records for service discovery. However I soon hit the network limit (started getting 'Not enough ENI' errors) so I've switched to Bridged networking and I'm trying out service discovery using SRV records.

The problem that I've run into is that Envoy proxy doesn't seem to support SRV for service discovery. Or if it does, what changes do I need to make to my setup? I've included the relevant portion of my cluster configuration

  clusters:
  - name: ms_auth
    connect_timeout: 0.25s
    type: strict_dns
    lb_policy: round_robin
    hosts:
    - socket_address:
        address: ms_auth.apis
        port_value: 80
  - name: ms_logging
    connect_timeout: 0.25s
    type: strict_dns
    lb_policy: round_robin
    hosts:
    - socket_address:
        address: ms_logging.apis
        port_value: 80

Failing that, what other options should I consider in getting this setup to work?


Solution

  • Posting the solution I ended up going with.

    I setup Consul to work as a discovery service. Basically a Consul sidecar would run alongside every cluster/webservice I have. When the webservice comes online, it would register itself with the Consul server. This way, only the Consul server name would need to be known.

    Once a service is registered, you can either query Consul to get the IP for the webservice, or directly access it in the form of <webservice_name>.service.consul

    The only change I had to make to the Envoy config was to point at the Consul server IP for DNS resolution (see below).

    clusters:
      - name: ms_auth
        connect_timeout: 0.25s
        type: strict_dns
        lb_policy: round_robin
        hosts:
        - socket_address:
            address: ms-auth.service.consul
            port_value: 80
        dns_resolvers:
        - socket_address:
            address: {DNS_RESOLVER_IP}
            port_value: 8600
    
      - name: ms_logging
        connect_timeout: 0.25s
        type: strict_dns
        lb_policy: round_robin
        hosts:
        - socket_address:
            address: ms-logging.service.consul
            port_value: 80
        dns_resolvers:
        - socket_address:
            address: {DNS_RESOLVER_IP}
            port_value: 8600