Search code examples
failoverclusterbully-algorithm

What election algorithm does microsoft failover cluster use?


I cannot find anything about the algorithm it uses as primary node election algorithm.

http://msdn.microsoft.com/en-us/library/aa373130%28v=vs.85%29.aspx

Is it a bully algorithm, or ring algorithm, or some other algorithms?


Solution

  • I'm not quite sure if this is exactly what you're looking for, but here is what I found on a website that explains Failover Clustering:

    Quorum implementation in Windows Server

    When a failover cluster is brought online (assuming one node at a time), the first disk brought online is one that will be associated the quorum model deployed. To do this, the failover cluster executes a disk arbitration algorithm to take ownership of that disk on the first node initially making it as offline and then going through a few checks. When the cluster is satisfied that there are no problems with the quorum, it is brought online. The same thing happens with the other disks. After all the disks come online, the Cluster Disk Driver sends periodic reservations every 3 seconds to keep ownership of the disk.

    If for some reason the cluster loses communication over all of its networks, the quorum arbitration process begins. The outcome is straightforward: the node that currently owns the reservation on the quorum is the defending node and the other nodes become challengers. When a challenger detects that it cannot communicate, it issues a request to break any existing reservations it owns via a buswide SCSI reset in Windows Server 2003 and persistent reservation in Windows Server 2008. Seven seconds after this reset happens, the challenger attempts to gain control of the quorum and then a few things can happen: If the node that already owns the quorum is up and running, it still has the reservation of the quorum disk thus the challenger cannot take ownership and it shuts down the Cluster Service; If the node that owns the quorum fails and gives up its reservation, then the challenger can take ownership after 10 seconds elapse. The challenger can reserve the quorum, bring it online, and subsequently take ownership of other resources in the cluster; If no node of the cluster can gain ownership of the quorum, the Cluster Service is stopped on all nodes.

    I hope that helps. Good luck :)

    The link to the website, so you can read more: http://networksandservers.blogspot.com/2011/04/failover-clustering-i.html