Search code examples
kubernetesstonith

Stonith for kubernetes


Does Kubernetes support for STONITH operations for hardware nodes? We have smart electric sockets which allows for API for 'power off server', and they work great with pacemaker.

Does Kubernetes support STONITH?


Solution

  • No yet.
    STONITH is mentioned in kubernetes issue 39828

    STONITH ("Shoot The Other Node In The Head" or "Shoot The Offending Node In The Head"), sometimes called STOMITH ("Shoot The Other Member/Machine In The Head"), is a technique for fencing in computer clusters.1

    Fencing is the isolation of a failed node so that it does not cause disruption to a computer cluster. As its name suggests, STONITH fences failed nodes by resetting or powering down the failed node.

    It is actually discussed in kubernetes/kops issue 2002

    I think we should take a look at the autoscaler and I think we could default to Reboot, perhaps configurable in the manifest to AllowTermination.

    But this is stale at the moment.

    This is also described in kubernetes/community/contributors/design-proposals/storage/pod-safety.md

    In order to reconcile partitions, an actor (human or automated) must decide when the partition is unrecoverable. The actor may be informed of the failure in an unambiguous way (e.g. the node was destroyed by a meteor) allowing for certainty that the processes on that node are terminated, and thus may resolve the partition by deleting the node and the pods on the node.
    Alternatively, the actor may take steps to ensure the partitioned node cannot return to the cluster or access shared resources - this is known as fencing and is a well understood domain.