Search code examples
shellrsyncfailover

Rsync stop when failover


I have two cpanel servers(A->B) with failover configured in dnsmadeeasy. I have right now setup a rsync to sync the /home/account folder every 4 hours from A->B.

So when A fails, B takes over with a backlog of 4 hours of data in server A.

My problem is when A comes back to normal from a failure, the rsync in B overwrites the data from A since the rsync is A->B.

I like to know what is the best method to prevent the rsync from running after the first failover so that I can manually handle the rsync. I am thinking of a shell script which will try to access a text file in server A, which if results in failure will stop the cron from running.

Is this a good way to handle this, or is there a easier way?


Solution

  • Well, I have done something similar on a group of servers I have at the office. An overview of what I have found to work well is simply to run a cron script that keeps the status of each of the other servers in a temporary status file and the status is updated with calls to ping.

    Specifically, the routine works by maintaining a list of hosts to be included in the check. Each host (except for the name matching the machine running the cron job) has a status file maintained in the /tmp directory called hoststatus.$HOSTNAME. Each status file contains either up or down. (if the status file does not exists, it is created during the check process and assumed up). The status files themselves provides a local means of checking the status of each remote host for any script before running it.

    The cron job that checks the status, reads the status file for each remote host and provides the status to a case statement. For the case where status is up a call is made to the remote host with ping -c1 hostname. If the ping succeeds, then the script exits (remote host is up). If the ping fails, then the script waits 20 seconds (to insure the remote isn't rebooting, etc.. and checks again. If the second call succeeds, the status remains up and the script exits. If the second call to ping fails, the wait for 20 seconds repeats and retests. If the third test fails, then the status file is written down and the remote host is considered down.

    Continuing in the case statement, if the initial status was down, a simple check is made with ping. If it succeeds, status is changed to up, if it fails, it remains down.

    A log file is also kept that reflects each change of status to provide a running history of server availability.

    Something similar would work for you case. If server A goes down, sever B could write a simple log in a similar fashion something like rsynchold.hostA that is checked before rsync is run between either A->B or B->A. This would allow you manual intervention with the first rsync after a failure -- at which time you could reset the rsynchold.hostA file.

    This isn't elegant, but it has proven fairly foolproof over the past several years.