Search code examples
mariadbshardingfailovergaleramaxscale

MariaDB Spider with Galera Clusters failover solutions


I am having problems trying to build a database solution for the experiment to ensure HA and performance(sharding).

Now, I have a spider node and two galera clusters (3 nodes in each cluster), as shown in the figure below, and this configuration works well in general cases.:

enter image description here

However, as far as I know, when the spider engine performs sharding, it must assign primary IP to distribute SQL statements to two nodes in different Galera clusters.

So my first question here is:

Q1): When the machine .12 shuts down due to destruction, how can I make .13 or .14(one of them) automatically replace .12?

  • The servers that spider engine know

enter image description here

Q2): Are there any open source tools (or technologies) that can help me deal with this situation? If so, please explain how it works. (Maybe MaxScale? But I never knew what it is and what it can do.)

Q3): The motivation for this experiment is as follows. An automated factory has many machines, and each machine generates some data that must be recorded during the production process (maybe hundreds or thousands of data per second) to observe the operation of the machine and make the quality of each batch of products the best. So my question is: how about this architecture (Figure 1)? or please provides your suggestions.


Solution

  • You could use MaxScale in front of the Galera cluster to make the individual nodes appear like a combined cluster. This way Spider will be able to seamlessly access the shard even if one of the nodes fails. You can take a look at the MaxScale tutorial for instructions on how to configure it for a Galera cluster.

    Something like this should work:

    enter image description here

    This of course has the same limitation that a single database node has: if the MaxScale server goes down, you'll have to switch to a different MaxScale for that cluster. The benefit of using MaxScale is that it is in some sense stateless which means it can be started and stopped almost instantly. A network load balancer (e.g. ELB) can already provide some form of protection from this problem.