Search code examples
mariadbgalera

Starting all nodes in galera at once


I have a galera cluster of three nodes, If I shut down the the three virtual machines and started them all at once, systemd will automatically start mariadb on each of the virtual machines.

Some times it happens that all of the mariadb instances start at once, and this result of a broken cluster.

Which I have to reinitiate using galera_new_cluster

The question is, why does starting all the mariadb instances at once break the cluster ?

Thank you


Solution

  • Whenever you start a node, it either starts as the first node in the cluster (initiates a new cluster), or it attempts to connect to an existing nodes using wsrep_cluster_address. The behavior depends on the node options.

    So, every time when you shut down or lose all nodes and start them again, there is nothing to connect to, and you need to start a new cluster. galera_new_cluster does that by starting a node with --wsrep-new-cluster option which overrides the current value of wsrep_cluster_address.

    If sometimes it works for you automatically, it most likely means that one of your nodes is permanently configured as the "first node", either via wsrep_cluster_address=gcomm://, or via wsrep-new-cluster. It is a wrong setup in itself. If you lose or shut down only this node and have to restart it, it won't join the remaining nodes in the cluster, it will create a new one.

    When you start all nodes at once, you create a race condition. If your "first node" comes up first and initializes quickly enough, it will create a new cluster, and other nodes will join it. If another node comes up first, it won't be able to join anything, thus you get a "broken cluster".

    You can find more information on restarting the whole cluster here: http://galeracluster.com/documentation-webpages/restartingcluster.html