My DB slave's files were corrupted (lost disk array) and now we're jammed when attempting the recovery. The databases are rather large (100GB each), and using mysqldump images crash out anywhere from 40GB to 69GB, after a full DAY of processing. We've tried individual DBs, and we keep having similar issues. The fact it takes so long to reload the DBs, I am looking for guidance when using "raw files" from the master to create the slave.
I was able to get maintenance downtime approval to clone off the entire /var/lib/mysql from the master, and I need to know what NOT to copy into on my slave from those data files. I would assume everything. The "slave" only lost the data (/data/lib/mysql) and not anything else. I already have "skip-slave-start" enabled in the slave's my.conf, so it won't be syncing anytime soon. I also collected the master-data information, while all items were locked, so I have those details too.
So, what's in /var/lib/mysql that should be different between the master and the slave, before I attempt to restart the slave's mysqld, and later, start slave?
You will need the whole of /var/lib/mysql, and you don't need any downtime if:
1) All of your mysql files including the binlogs and transaction logs are in /var/lib/mysql AND
2) you have /var/lib/mysql on LVM (or better, on ZFS)
Make sure you are running with safe settings (flush_logs_at_trx_commit=1, sync_binlog=1).
Assuming you are on ZFS with data/mysql mounted on /var/lib/mysql (because I can't remember the LVM incantation from muscle memory), you would do somthing like this:
$ sudo mysql
mysql> FLUSH TABLES WITH READ LOCK;
mysql> \! sync
mysql> \! zfs snapshot data/mysql@transfer
mysql> UNLOCK TABLES;
Replace "zfs snapshot" line above with the relevant LVM snapshot command for your system.
You can now transfer the contents of that snapshot using tar (or zfs send) over netcat or ssh.
Purge /var/lib/mysql on the target completely before restoring.
Remove /var/lib/mysql/master.info on target after restoring.
Note the latest binlog name and size BEFORE you start mysqld.
Optional but recommended: set read_only=1 in my.cnf on the target server.
Now start mysqld. Run:
CHANGE MASTER TO ... MASTER_LOG_FILE='<name of latest binlog you noted earlier>', MASTER_LOG_POS=<size if binlog you noted above>;
START SLAVE;
Your newly reinitialize slave should now be up and replicating.