Search code examples
nginxroutesfailovernginx-upstreams

NGINX routing based on server 200 response failures


My goal is to configure nginx's stream object(s) in the config to route requests to a backup upstream in the event that one fails on certain health checks (2/3)

The health checks while sort of specific I believe shouldn't be an issue:

-TCP 1212 availability

-TCP 1912 availability

-HTTP GET on 7078 /?

-Response should be 200 and if I can get the body somehow to check that it's as expected, even better!

If these checks fail on one upstream "cluster" so to speak, I would like to route requests to another identical cluster, much like a back up.

The issue I'm solving lies in the fact that the servers are quite literally half a world apart and so load balancing through one server would cause the same latency as if you waited for it to fail. So while a load balancer would have "routing" behavior in the end, the response time would be unacceptable.

Is there a way to do this in NGINX configs or am I spreading it too thin?


Solution

  • The NGINX upstream module will do passive health checks for you, meaning it will react to connection failures, and optionally switch to backup servers as necessary. To some extent, that might be enough for you.

    What you're describing here though are active health checks that let you check different ports from the traffic port, assert HTTP status, header values and even body content. Unfortunately, having dangled that in front of you, these are only available as part of the NGINX Commercial Subscription, which I'm guessing isn't what you're looking for.

    If you do need that kind of pro-active health checks, you can still do it from outside of NGINX. One approach might be:

    1. put your upstreams in separate confs, and include one of them where you need it
    2. use ncat and/or curl in a every-minute cron job to do the tests that matter to you
    3. if ever those tests fail, switch out the upstream confs, and tell NGINX to do a zero-downtime reload

    You can switch confs by fast mv to rename the right one to match the include, you shouldn't have to rewrite anything.