I have a 3 node docker swarm.
One stack deployed is a database cluster with 3 replicas. (MariaDB Galera)
Another stack deployed is a web application with 2 replicas.
The web application looks like this:
version: '3'
networks:
web:
external: true
galera_network:
external: true
services:
application:
image: webapp:latest
networks:
- galera_network
- web
environment:
DB_HOST: galera_node
deploy:
replicas: 2
FWIW, the web
network is what traefik is hooked up to.
The issue here is galera_node
(used for the webapp's database host) resolves to a VIP that ends up leveraging swarm's mesh routing (as far as I can tell) and that adds extra latency when the mesh routing ends up going over the WAN instead of resolving to the galera_node
container that is deployed on the same physical host.
Another option I've found is to use tasks.galera_node
, but that seems to use DNSRR for the 3 galera cluster containers. So 33% of the time, things are good and fast... but the rest of the time, I have unnecessary latency added to the mix.
These two behaviors look to be documented as what we'd expect from the different endpoint_mode
options. Reference: Docker endpoint_mode
To illustrate the latency, notice when pinging from within the webapp container: Notice the IP addresses that are resolving for each ping along with the response time.
### hitting VIP that "masks" the fact that there is extra latency
### behind it depending on where the mesh routing sends the traffic.
root@294114cb14e6:/var/www/html# ping galera_node
PING galera_node (10.0.4.16): 56 data bytes
64 bytes from 10.0.4.16: icmp_seq=0 ttl=64 time=0.520 ms
64 bytes from 10.0.4.16: icmp_seq=1 ttl=64 time=0.201 ms
64 bytes from 10.0.4.16: icmp_seq=2 ttl=64 time=0.153 ms
--- galera_node ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.153/0.291/0.520/0.163 ms
### hitting DNSRR that resolves to worst latency server
root@294114cb14e6:/var/www/html# ping tasks.galera_node
PING tasks.galera_node (10.0.4.241): 56 data bytes
64 bytes from 10.0.4.241: icmp_seq=0 ttl=64 time=60.736 ms
64 bytes from 10.0.4.241: icmp_seq=1 ttl=64 time=60.573 ms
--- tasks.galera_node ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 60.573/60.654/60.736/0.082 ms
### hitting DNSRR that resolves to local galera_node container
root@294114cb14e6:/var/www/html# ping tasks.galera_node
PING tasks.galera_node (10.0.4.242): 56 data bytes
64 bytes from 10.0.4.242: icmp_seq=0 ttl=64 time=0.133 ms
64 bytes from 10.0.4.242: icmp_seq=1 ttl=64 time=0.117 ms
--- tasks.galera_node ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.117/0.125/0.133/0.000 ms
### hitting DNSRR that resolves to other "still too much" latency server
root@294114cb14e6:/var/www/html# ping tasks.galera_node
PING tasks.galera_node (10.0.4.152): 56 data bytes
64 bytes from 10.0.4.152: icmp_seq=0 ttl=64 time=28.218 ms
64 bytes from 10.0.4.152: icmp_seq=1 ttl=64 time=40.912 ms
64 bytes from 10.0.4.152: icmp_seq=2 ttl=64 time=26.293 ms
--- tasks.galera_node ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/stddev = 26.293/31.808/40.912/6.486 ms
The only way I've been able to get decent performance that bypasses the latency is to hard code the IP address of the local container, but that is obviously not a long-term solution as containers should be treated as ephemeral things.
I totally get that I might need to rethink my geographic node locations due to this latency, and there might be some other performance tuning things I can do. It seems like there should be a way to enforce my desired behavior, though.
I essentially want to bypass DNSRR and the VIP/mesh routing behavior when a local container is available to service the given request.
So the question is:
How can I have each replica of my webapp only hit the local swarm host's galera container without hard coding that container's IP address?
If anyone else is fighting with this sort of issue, I wanted to post a solution (though I wouldn't necessarily call it an "answer" to the actual question) that is more of a workaround than something I'm actually happy with.
Inside of my webapp, I can use galera_node
as my database host and it resolves to the mesh routing VIP that I mentioned above. This gives me functionality no matter what, so if my workaround gets tripped up I know that my connectivity is still in tact.
I whipped up a little bash script that I could call as a cron job and give me the results that I want. It could be used for other use cases that stem from this same issue.
It takes in three parameters:
$1
= database container name$2
= database network name$3
= webapp container nameThe script looks for the container name, finds its IP address for the specified network, and then adds that container name and IP address to the webapp container's /etc/hosts
file.
This works because the container name is also galera_node
(in my case) so adding it to the hosts file just overrides the hostname that docker resolves to the VIP.
As mentioned, I don't love this, but it does seem to work for my purposes and it avoids me having to hardcode IP addresses and manually maintain them. I'm sure there are some scenarios that will require tweaks to the script, but it's a functional starting point.
My script: update_container_hosts.sh
#!/bin/bash
HOST_NAME=$1
HOST_NETWORK=$2
CONTAINER_NAME=$3
FMT="{{(index (index .NetworkSettings.Networks \"$HOST_NETWORK\") ).IPAddress}}"
CONTAINERS=`docker ps | grep $CONTAINER_NAME | cut -d" " -f1`
HOST_ID=`docker ps | grep $HOST_NAME | cut -d" " -f1`
HOST_IP=$(docker inspect $HOST_ID --format="$FMT")
echo --- containers ---
echo $CONTAINERS
echo ------------------
echo host: $HOST_NAME
echo network: $HOST_NETWORK
echo ip: $HOST_IP
echo ------------------
for c in $CONTAINERS;
do
if [ "$HOST_IP" != "" ]
then
docker cp $c:/etc/hosts /tmp/hosts.tmp
IP_COUNT=`cat /tmp/hosts.tmp | grep $HOST_IP | wc -l`
rm /tmp/hosts.tmp
if [ "$IP_COUNT" = "0" ]
then
docker exec $c /bin/sh -c "echo $HOST_IP $HOST_NAME >> /etc/hosts"
echo "$c: Added entry to container hosts file."
else
echo "$c: Entry already exists in container hosts file. Skipping."
fi
fi
done