docker-compose docker-swarm docker-network

Docker swarm network latency with mesh and DNSRR

I have a 3 node docker swarm.

One stack deployed is a database cluster with 3 replicas. (MariaDB Galera)

Another stack deployed is a web application with 2 replicas.

The web application looks like this:

version: '3'

networks:
  web:
    external: true
  galera_network:
    external: true

services:
  application:
    image: webapp:latest
    networks:
      - galera_network
      - web
    environment:
      DB_HOST: galera_node
    deploy:
      replicas: 2

FWIW, the web network is what traefik is hooked up to.

The issue here is galera_node (used for the webapp's database host) resolves to a VIP that ends up leveraging swarm's mesh routing (as far as I can tell) and that adds extra latency when the mesh routing ends up going over the WAN instead of resolving to the galera_node container that is deployed on the same physical host.

Another option I've found is to use tasks.galera_node, but that seems to use DNSRR for the 3 galera cluster containers. So 33% of the time, things are good and fast... but the rest of the time, I have unnecessary latency added to the mix.

These two behaviors look to be documented as what we'd expect from the different endpoint_mode options. Reference: Docker endpoint_mode

To illustrate the latency, notice when pinging from within the webapp container: Notice the IP addresses that are resolving for each ping along with the response time.

### hitting VIP that "masks" the fact that there is extra latency 
### behind it depending on where the mesh routing sends the traffic.

root@294114cb14e6:/var/www/html# ping galera_node
PING galera_node (10.0.4.16): 56 data bytes
64 bytes from 10.0.4.16: icmp_seq=0 ttl=64 time=0.520 ms
64 bytes from 10.0.4.16: icmp_seq=1 ttl=64 time=0.201 ms
64 bytes from 10.0.4.16: icmp_seq=2 ttl=64 time=0.153 ms
--- galera_node ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.153/0.291/0.520/0.163 ms

### hitting DNSRR that resolves to worst latency server

root@294114cb14e6:/var/www/html# ping tasks.galera_node
PING tasks.galera_node (10.0.4.241): 56 data bytes
64 bytes from 10.0.4.241: icmp_seq=0 ttl=64 time=60.736 ms
64 bytes from 10.0.4.241: icmp_seq=1 ttl=64 time=60.573 ms
--- tasks.galera_node ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 60.573/60.654/60.736/0.082 ms

### hitting DNSRR that resolves to local galera_node container

root@294114cb14e6:/var/www/html# ping tasks.galera_node
PING tasks.galera_node (10.0.4.242): 56 data bytes
64 bytes from 10.0.4.242: icmp_seq=0 ttl=64 time=0.133 ms
64 bytes from 10.0.4.242: icmp_seq=1 ttl=64 time=0.117 ms
--- tasks.galera_node ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.117/0.125/0.133/0.000 ms

### hitting DNSRR that resolves to other "still too much" latency server

root@294114cb14e6:/var/www/html# ping tasks.galera_node
PING tasks.galera_node (10.0.4.152): 56 data bytes
64 bytes from 10.0.4.152: icmp_seq=0 ttl=64 time=28.218 ms
64 bytes from 10.0.4.152: icmp_seq=1 ttl=64 time=40.912 ms
64 bytes from 10.0.4.152: icmp_seq=2 ttl=64 time=26.293 ms
--- tasks.galera_node ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/stddev = 26.293/31.808/40.912/6.486 ms

The only way I've been able to get decent performance that bypasses the latency is to hard code the IP address of the local container, but that is obviously not a long-term solution as containers should be treated as ephemeral things.

I totally get that I might need to rethink my geographic node locations due to this latency, and there might be some other performance tuning things I can do. It seems like there should be a way to enforce my desired behavior, though.

I essentially want to bypass DNSRR and the VIP/mesh routing behavior when a local container is available to service the given request.

So the question is:

How can I have each replica of my webapp only hit the local swarm host's galera container without hard coding that container's IP address?

Solution

If anyone else is fighting with this sort of issue, I wanted to post a solution (though I wouldn't necessarily call it an "answer" to the actual question) that is more of a workaround than something I'm actually happy with.

Inside of my webapp, I can use galera_node as my database host and it resolves to the mesh routing VIP that I mentioned above. This gives me functionality no matter what, so if my workaround gets tripped up I know that my connectivity is still in tact.

I whipped up a little bash script that I could call as a cron job and give me the results that I want. It could be used for other use cases that stem from this same issue.

It takes in three parameters:

$1 = database container name
$2 = database network name
$3 = webapp container name

The script looks for the container name, finds its IP address for the specified network, and then adds that container name and IP address to the webapp container's /etc/hosts file.

This works because the container name is also galera_node (in my case) so adding it to the hosts file just overrides the hostname that docker resolves to the VIP.

As mentioned, I don't love this, but it does seem to work for my purposes and it avoids me having to hardcode IP addresses and manually maintain them. I'm sure there are some scenarios that will require tweaks to the script, but it's a functional starting point.

My script: update_container_hosts.sh

#!/bin/bash
HOST_NAME=$1
HOST_NETWORK=$2
CONTAINER_NAME=$3

FMT="{{(index (index .NetworkSettings.Networks \"$HOST_NETWORK\") ).IPAddress}}"
CONTAINERS=`docker ps  | grep $CONTAINER_NAME | cut -d" " -f1`
HOST_ID=`docker ps | grep $HOST_NAME | cut -d" " -f1`
HOST_IP=$(docker inspect $HOST_ID --format="$FMT")

echo --- containers ---
echo $CONTAINERS
echo ------------------
echo host: $HOST_NAME
echo network: $HOST_NETWORK
echo ip: $HOST_IP
echo ------------------

for c in $CONTAINERS;
do
    if [ "$HOST_IP" != "" ]
    then
        docker cp $c:/etc/hosts /tmp/hosts.tmp
        IP_COUNT=`cat /tmp/hosts.tmp | grep $HOST_IP | wc -l`
        rm /tmp/hosts.tmp
        if [ "$IP_COUNT" = "0" ]
        then
                docker exec  $c /bin/sh -c "echo $HOST_IP $HOST_NAME >> /etc/hosts"
                echo "$c: Added entry to container hosts file."
        else
                echo "$c: Entry already exists in container hosts file.  Skipping."
        fi
    fi
done