Search code examples
mysqldebianreplicationrecovery

mysql slave replication failed


I have a situation where dns server get the records for its master and all the records are being replicated from master to slave and slave is being used for the resolution. The replication broke after the mysql server upgrade. The mysql server stopped and name of the log file and log position changed until mysql was recovered. Now i know if i change the log position and log file name, the replication will start but i will miss a lot of updates and that i don't want. What should i do to restart the master slave replication without losing any updates on the master. Every single update is important. Here are some information from slave status.

 Slave_IO_Running: No
 Slave_SQL_Running: Yes

Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'

Thanks


Solution

  • You may be in for a nice surprise but here is goes:

    Run SHOW SLAVE STATUS\G. For the sake of example, let's say you get this:

                 Slave_IO_State: Waiting for master to send event
                    Master_Host: 10.64.68.253
                    Master_User: replusername
                    Master_Port: 3306
                  Connect_Retry: 60
                Master_Log_File: mysql-bin.003202
            Read_Master_Log_Pos: 577991837
                 Relay_Log_File: relay-bin.010449
                  Relay_Log_Pos: 306229695
          Relay_Master_Log_File: mysql-bin.003202
               Slave_IO_Running: Yes
              Slave_SQL_Running: Yes
                Replicate_Do_DB:
            Replicate_Ignore_DB:
             Replicate_Do_Table:
         Replicate_Ignore_Table: 
        Replicate_Wild_Do_Table:
    Replicate_Wild_Ignore_Table: 
                     Last_Errno: 0
                     Last_Error:
                   Skip_Counter: 0
            Exec_Master_Log_Pos: 577991837
                Relay_Log_Space: 306229695
                Until_Condition: None
                 Until_Log_File:
                  Until_Log_Pos: 0
             Master_SSL_Allowed: No
             Master_SSL_CA_File:
             Master_SSL_CA_Path:
                Master_SSL_Cert:
              Master_SSL_Cipher:
                 Master_SSL_Key:
          Seconds_Behind_Master: 0
    

    You choose the following from the display:

    • Relay_Master_Log_File (mysql-bin.003202)
    • Exec_Master_Log_Pos (577991837)

    Here is why: Relay_Master_Log_File and Exec_Master_Log_Pos represent binlog entry from the Master that made it to the Slave and was executed successfully. Simply pickup from there.

    You would simply run this code:Exec_Master_Log_Pos

    STOP SLAVE;
    CHANGE MASTER TO
    MASTER_LOG_FILE='mysql-bin.003202',
    MASTER_LOG_POS=577991837;
    START SLAVE;
    

    Give it a Try !!!

    CAVEAT

    If Relay_Master_Log_File no longer exists on the Master, you may have to do some damage control. Given the SHOW SLAVE STATUS\G mentioned before, you may have to skip to the next binary log on the Master as follows:

    STOP SLAVE;
    CHANGE MASTER TO
    MASTER_LOG_FILE='mysql-bin.003203',
    MASTER_LOG_POS=4;
    START SLAVE;
    

    If replication catches up, you are not out of the woods just. You may have to download Percona Toolkit and run pt-table-checksum and pt-table-sync to repair the lost data on the Slave.

    If replication does not get off the ground, you will have to perform due diligence and reload the Slave.

    Hopefully, you may not have to do anything in this caveat if replication works with the original suggestion.