I have the following requirements for an application that many people will be using in the office - no Server component. All instances of client apps should somehow negotiate between themselves to decide which client will take on the server role. And the clients should communicate between themselves via IP.
If and when the client app goes down, another client must take over in a seamless manner. I understand that having a server would be much, much simpler. But because the app must be very resilient, the powers that be do not want to risk server going down (or even its backup) and rather rely on this hybrid mesh connectivity, where the server role hops from client to client.
I think I got the app connectivity down. Basically, when the app starts, it announces itself via UDP (either to a predefined IP address that everything listens to or via UDP broadcast). From there on, the communication proceeds in a similar manner.
The part I am having problems with is how to negotiate/self-organize between the clients to pick one with a server role. And how to reliably detect that a client has gone down and then a new negotiation must take place. The final difficulty is replicating data that's been accumulated by the client with a server role.
I created a prototype in c# that communicates and tries to replicate data, but the negotiation part (particularly coupled with an client failure).
Initially I thought that's what ZeroConf (aka Bonjour) did. But that just announces available network services.
Anyway, I do not want to reinvent and I can't be the first person to want to do this. So my questions:
So, you currently have a system in which each client on a LAN will announce itself via UDP to the rest of the LAN. One of the client apps is a "server" and has some additional command & control powers on top of being a client in itself.
This is not a new idea, to be sure. What you want is to add some additional talking during the initial "here I am" connection communication. When a new client yells "here I am" to the rest of the LAN, if there is a server, the server should say "Welcome, I'm the server", and the new client app now knows which client is acting as the server. All other clients should probably say "hi" as well. If there isn't a server, the new client should first repeat the "hello" (it is UDP after all; you have to expect some messages not to be received), and if nobody responds, this new client is the only one on the network and is the "server" by default. If there are others but none are claiming to be the server, the clients can "discuss amongst themselves" to determine a new server.
In addition to this, the server copy should periodically (maybe every 3-5 seconds) yell out "I'm still here" to everyone; this is known as a "heartbeat" message and is a very common alternative to the two-way "ping" method of verification.
If the server app (or any copy, really) closes normally, it should yell out "goodbye everyone, figure out who's the next server". The remaining clients can then discuss amongst themselves. If the client acting as server crashes, the clients will miss the server's "heartbeat" message, ask "who's the server", and if nobody still responds the clients will discuss amongst themselves.
Now, clients "discussing amongst themselves" can be as simple or as complex as you like. The simplest would be for whichever client says "OK, I'm server now" to become server. You would probably have to include some sort of time in the message so that if another computer says it at the same time the clients can say "well client 15 said it first so we're going with him". Clients can "vote"; have each client talk to all others to determine nominal latency between that client and all others, and that client will "vote" for the lowest-latency connection (no client may vote for itself unless it discovers it's the only one). Most votes wins.
Or, a server can, as part of its "heartbeat" message, say "if I go down, my successor is client X"; and if a heartbeat is missed and the subsequent "are you still there, server" messages from the clients are not responded to, the clients can say "the king is dead! Long live King Client X!".
Understand that by necessity, this layer of governance within an all-client system in picking an "authoritative" client to become the server is going to dramatically increase the complexity of client communications. Also, while your use of the UDP protocol lends itself to speedy communication, UDP messages collide ALL THE TIME; if you're talking while another person is talking, your messages collide. So, I would recommend the use of TCP instead of UDP for most communication in this software in which it is necessary for a particular client to be heard. That is any direct interrogation of a client ("are you still there, server?"), whatever process you use to have the clients decide who a new server is, etc etc.