Healthcheck to detect drained node

We have a Kubernetes cluster behind a L4 load balancer but we do not have programmatic access to the load balancer to add/remove nodes when we need to update/reboot nodes (the LB is managed by the Support team of our hosting provider).

The load balancer does support healthchecks but the current setup is to call port 80 on each node to determine whether the node is healthy. This will succeed even if the node is drained so we have no choice except to reboot the node and wait up to 10 seconds for the LB to notice and take it out of the set when kubeapi dies.

I want something like a pod per-node that we could use to determine whether the node is alive, presumably setup with a node port. The problem is that I can't find how to do this. If I use a daemonset, I don't think the pods are evicted during drain so that wouldn't work and if I used a normal deployment, there is no guarantee that a healthy node will have an instance of the pod and would appear unhealthy. Even with anti-affinity setup I don't think there is any guarantee that all healthy nodes will have a running pod to check.

Does anyone know a way of using a TCP or HTTP call to a node to detect it is drained?

Solution

It seems that the solution you are looking for is fully described in this documentation:

Node Problem Detector is a daemon for monitoring and reporting about a node's health. You can run Node Problem Detector as a DaemonSet or as a standalone daemon. Node Problem Detector collects information about node problems from various daemons and reports these conditions to the API server as NodeCondition and Event.

You can create a node monitoring based on its condition.

You need to also know about the limitations:

Node Problem Detector only supports file based kernel log. Log tools such as journald are not supported.

Node Problem Detector uses the kernel log format for reporting kernel issues. To learn how to extend the kernel log format, see Add support for another log format.