In order to get around healthchecks where SNI is required while using TCP I have the following which kind of works
listen website
bind :10003
mode tcp
server website_proxy_aws localhost:14001 check fall 3 rise 2
server website_proxy_dc localhost:14002 check fall 3 rise 2
listen website_proxy_aws
bind :14001
mode tcp
option httpchk HEAD / HTTP/1.1\r\nHost:\ website-lb.domain.com
server website_proxy_svc_aws internal-alb.eu-west-1.elb.amazonaws.com:80 check sni req.hdr(Host) verify none fall 3 rise 2 weight 2
listen website_proxy_do
bind :14002
mode tcp
option httpchk HEAD / HTTP/1.1\r\nHost:\ website-lb.domain.com
server ivendiwebsite_proxy_svc_dc do-website-lb.domain.com:443 check ssl sni req.hdr(Host) verify none fall 3 rise 2 weight 2
Now if website_proxy_do
is down it appears red for the website_proxy_do
listener in the stats page.
But the website
listener appears green for both.
I imagine there's a simple explanation to what I'm doing wrong here.
(I'm aware in this example I could use one listen as the host is the same across both I'm just interested in why the website listener is supposedly doing a tcp check but failing to acknowledge that the down site is down)
I understood what you are looking at. Before I answer, I would like to add how I replicated the problem so that it is easy to say that we are on the same page.
These are my HAProxy listeners replicated similar to what you did.
listen port_33306
bind :33306
mode tcp
server local-tunnel localhost:23306 check
listen port_23306
bind :23306
mode tcp
server mysql-tunnel localhost:13306 check
I tunnelled a MySQL running on Aurora using socat
on port 13306, this would make it very easy to understand what's the problem would be.
So here is what it looks like
localhost:33306 => localhost:23306 => localhost:13306 => myaurora.aws-aurora-blah-blah.com:3306
Right now, here is what my netstat
looks like
robot@proxy:~$ netstat -tulpn
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:13306 0.0.0.0:* LISTEN 3465/socat
tcp 0 0 0.0.0.0:23306 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:33306 0.0.0.0:* LISTEN -
And this is what my haproxy?stats
page look like.
Now, I am taking down that socat
tunnel of tcp port 13306
to Aurora, so like you said it will disrupt the flow and show listener port_23306
as red on my haproxy?stats
page because port 13306
is no longer exists.
So, until here we are having the similar outputs and now the question is why is listener port_33306
still green while the port_23306
is failing checks.
Right now this what my netstat
looks like.
robot@proxy:~$ netstat -tulpn
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:23306 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:33306 0.0.0.0:* LISTEN -
As you can see, the tunnel socat
for port 13306
doesn't exist as I killed the process, and the listener is checking for localhost:13306
. Even though the connection doesn't exist, HAProxy kept the port binding open leaving port 23306
for comms if I start the tunnel it will be obviously back to green without restarting my HAProxy.
Due to this, 23306
is open and the listener in port_33306
which bind the port 33306
to localhost:23306
checks for 23306
only and doesn't do any deeper checks down the tree if 23306
is connected to some process or not.
Hence, we are able to see listener port_33306
still green as it checks for port 23306
and while listener port_23306
is red, as binding 23306
is open and ready to get connected to something but the check is done on port 13306
which is not open, so health checks are passing for port 23306
but not for 13306
.
It's difficult to explain in exact terms and everything, but I tried.
Hope this helps.