Search code examples
windowsserveralternate

Have program execution switch between computers/servers


I've got two servers and a program that I want to run on them (not necessarily simultaneously).

Let's call one server "SA" and the other one "SB".

SB is a backup for SA, and I want that while my program is executing on SA, if SA fails then the program will immediately pick up where it left off and continue executing on SB.

What is the easiest way I can accomplish this?


Solution

  • There are probably a bunch of ways that this could be done, but I'd use an exclusive file lock to do it. To make that happen, you need enough network connectivity between the two servers that both could open a file for writing to.

    Your basic algorithm (pseudocode) goes like this:

    File f;
    while (true) {
        myTurn = false
        try {
            Open Network file for writing
            myTurn = true;
        } catch (IOException e) {
            // not logging anything because this is expected.
            // you might log that you tried maybe
            myTurn = false;
        }
        if ( myTurn ) {
            Do all of your actual work here.
            loop many times if that's what you're doing.  
            don't exit this bit until your server wants to shut down
              (or crashes).
            But don't close the file
        }
    }
    

    Basically what happens is that your app tries to open a file exclusively.

    If it can't open it, then the other server is locked, so this server should stay quiet.

    If it can open the file, then the other server is not running and this server should do the work.

    For this to work, it's absolutely essential that the "work" routine, does not hang - as long as the other server's process is active, it will hang onto that network file lock. So if the other server goes into an infinite loop, you'll be out of luck.

    And remember, both servers are trying to open the same network file. If they're trying to open a local file, it's not going to work.

    This question has an example that you could probably re-use: Getting notified when a file lock is released