Search code examples
mysqlperconagaleragtid

Does MySQL stalls the whole cluster during DDL statements?


Recently, I read that Galera based MySQL cluster uses a concept called total order isolation (https://galeracluster.com/library/documentation/schema-upgrades.html#toi) for DDL's by default which stalls the writes on the whole cluster until it is commited on all the nodes.

How does MySQL handles DDL in native asynchronous replication ? Does it stall writes for the other schemas as well?


Solution

  • Native Replication sticks the DDL into the replication stream. When the command pops up in the Slave, it executes the DDL before moving on to other queries in the replication stream.

    Caveat: The above statement assumes old flavor, without multi-master replication or multiple replication threads. Regardless of this caveat the table being modified is blocked on the Slave just as it was on the Master.

    Galera's TOI goes to some extra effort to make sure all the nodes are in sync, even accounting for the DDL versus ordinary writes. Hnece the name "Total Order of Inserts".

    Galera's RSU is, in many cases, a viable alternative. It is not more invasive than a crash of each node, one at a time (hence "Rolling"). Assuming connections can failover to different nodes, RSU avoids other blockage.

    Still, you should make a conscious choice between RSU and TOI; there are use cases for dictating one versus the other.

    In a distributed system (multiple nodes, multiple clients, etc), pushing code gets tricky. I like to take this approach, even though it leads to perhaps 3 times as many pushes:

    1. Push application code to discover whether the database change has been pushed. Have the code work either with the old schema or new. Do this "push" in a "rolling" manor
    2. Push the new schema (eg CREATE/ALTER TABLE).
    3. Clean up the code. (Again, "roll" it out to the many clients.)