Why doesn't haproxy tcp check work when proxying via itself

In order to get around healthchecks where SNI is required while using TCP I have the following which kind of works

listen website
  bind :10003
  mode tcp

  server website_proxy_aws localhost:14001  check fall 3 rise 2
  server website_proxy_dc  localhost:14002  check fall 3 rise 2

listen website_proxy_aws
  bind :14001
  mode tcp
  option httpchk HEAD / HTTP/1.1\r\nHost:\ website-lb.domain.com
  server website_proxy_svc_aws internal-alb.eu-west-1.elb.amazonaws.com:80 check sni req.hdr(Host) verify none fall 3 rise 2 weight 2

listen website_proxy_do
  bind :14002
  mode tcp
  option httpchk HEAD  / HTTP/1.1\r\nHost:\ website-lb.domain.com
  server ivendiwebsite_proxy_svc_dc do-website-lb.domain.com:443 check ssl sni req.hdr(Host) verify none  fall 3 rise 2 weight 2

Now if website_proxy_do is down it appears red for the website_proxy_do listener in the stats page. But the website listener appears green for both.

I imagine there's a simple explanation to what I'm doing wrong here.

(I'm aware in this example I could use one listen as the host is the same across both I'm just interested in why the website listener is supposedly doing a tcp check but failing to acknowledge that the down site is down)

Solution

I understood what you are looking at. Before I answer, I would like to add how I replicated the problem so that it is easy to say that we are on the same page.

These are my HAProxy listeners replicated similar to what you did.

listen port_33306
   bind :33306
   mode tcp
   server local-tunnel localhost:23306 check


listen port_23306
   bind :23306
   mode tcp
   server mysql-tunnel localhost:13306 check

I tunnelled a MySQL running on Aurora using socat on port 13306, this would make it very easy to understand what's the problem would be.

So here is what it looks like

localhost:33306 => localhost:23306 => localhost:13306 => myaurora.aws-aurora-blah-blah.com:3306

Right now, here is what my netstat looks like

robot@proxy:~$ netstat -tulpn
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name                    
tcp        0      0 0.0.0.0:13306           0.0.0.0:*               LISTEN      3465/socat          
tcp        0      0 0.0.0.0:23306           0.0.0.0:*               LISTEN      -                   
tcp        0      0 0.0.0.0:33306           0.0.0.0:*               LISTEN      -

And this is what my haproxy?stats page look like.

Now, I am taking down that socat tunnel of tcp port 13306 to Aurora, so like you said it will disrupt the flow and show listener port_23306 as red on my haproxy?stats page because port 13306 is no longer exists.

So, until here we are having the similar outputs and now the question is why is listener port_33306 still green while the port_23306 is failing checks.

Right now this what my netstat looks like.

robot@proxy:~$ netstat -tulpn
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name                    
tcp        0      0 0.0.0.0:23306           0.0.0.0:*               LISTEN      -                   
tcp        0      0 0.0.0.0:33306           0.0.0.0:*               LISTEN      -

As you can see, the tunnel socat for port 13306 doesn't exist as I killed the process, and the listener is checking for localhost:13306. Even though the connection doesn't exist, HAProxy kept the port binding open leaving port 23306 for comms if I start the tunnel it will be obviously back to green without restarting my HAProxy.

Due to this, 23306 is open and the listener in port_33306 which bind the port 33306 to localhost:23306 checks for 23306 only and doesn't do any deeper checks down the tree if 23306 is connected to some process or not.

Hence, we are able to see listener port_33306 still green as it checks for port 23306 and while listener port_23306 is red, as binding 23306 is open and ready to get connected to something but the check is done on port 13306 which is not open, so health checks are passing for port 23306 but not for 13306.

It's difficult to explain in exact terms and everything, but I tried.

Hope this helps.