I have the following MySQL instances, along with a replication setup:
S1 -----> (M1 <--> M2), where:
M1 - M2 is a multi-master replication setup,
S1 - a slave which replicates the writes which are done at Master M1.
Now, I'm trying to enhance the setup with a channel failover mechanism, where S1 would start replicating from M2, should M1 go down. Currently, the only way of doing this that I see is:
(M1 failure detection mechanism on S1 machine), then:
-> S1 gets the latest timestamp of M1's queries from the local relay log file.
-> M2 searches (bash script using mysqlbinlog utility) for the local binlog file + binlog index which corresponds to S1's latest timestamp
-> S1 can finally do a "STOP SLAVE", "CHANGE MASTER TO master_host=M2... master_log_file=... master_log_pos=...", etc. command to continue replication, but from M2 this time
Is there a better (and less error prone) way of doing this?
Thank you
EDIT: Nowadays, this is much easier to achieve thanks to the unique Xid
binlog query tags commonly used by the publicly accessible MySQL clustering solutions.
There is a more simplistic way to retrieve the binlog and position needed.
Would it make more sense to just use the current binlog and position as M2 knows it ? You need to check the Slave status on M2.
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.64.51.130
Master_User: replicant
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000463
Read_Master_Log_Pos: 453865699
Relay_Log_File: relay-bin.001226
Relay_Log_Pos: 453865845
Relay_Master_Log_File: mysql-bin.000463
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: search_cache
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 453865699
Relay_Log_Space: 453866038
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 106451130
1 row in set (0.00 sec)
mysql>
For this display, there are five crucial components:
Relay_Log_Space
Please note Relay_Log_Space
. Once this number stops incrementing, every SQL statement imported from the Master has been read. Unfortunately, it is possible that the last relay log may be corrupt or simply incomplete because of a sudden failover.
Replication Coordinates
Please also note that the Replication Coordinates (Relay_Master_Log_File, Exec_Master_Log_Pos)
. This is the position you are hunting for. However, like Relay_Log_Space
it may still be incrementing. In fact, those Replication Coordinates should be equal to the other Replication Coordinates (Master_Log_File,Read_Master_Log_Pos )
. That's when you know everything is caught up. If the pair of Replication Coordinates never meet, then you should rely on Relay_Log_Space
a little more in terms of when it stops incrementing.
Seconds_Behind_Master
?The reason you cannot use Seconds_Behind_Master
is simple. Once a Master goes down hard, all it takes just one Replication thread (Slave_IO_Running
or Slave_SQL_Running
) to become No
and Seconds_Behind_Master
turns NULL
.