Search code examples
multithreadingconcurrencylockingdistributedlock-free

Concurrent read and writers through cloned data structures?


I read this question but it didn't really help.

First and most important thing: time performances are the focus in the application that I'm developing

We have a client/server model (even distributed or cloud if we wish) and a data structure D hosted on the server. Each client request consists in:

  1. Read something from D
  2. Eventually write something on D
  3. Eventually delete something on D

We can say that in this application the relation between the number of received operations can be described as delete<<write<<read. In addition:

  1. Read ops cannot absolutely wait: they must be processed immediately
  2. Write and delete can wait some time, but sooner is better.

From the description above, any lock-mechanism is not acceptable: this would imply that read operations could wait, which is not acceptable (sorry if I stress it so much, but it's really a crucial point).

Consistency is not necessary: if a write/delete operation has been performed and then a read operation doesn't see the write/delete effect it's not a big deal. It would be better, but it's not required.

The solution should be data-structure-independent, so it shouldn't matter if we write on a vector, list, map or Donald Trump's face.

The data structure could occupy a big amount of memory.

My solution so far:

We use two servers: the first server (called f) has Df, the second server (called s) has Ds updated.

f answers clients requests using Df and sends write/delete operations to s. Then s applies write/delete operations Ds sequentially.

At a certain point, all future client requests are redirected to s. At the same time, f copies s updated Ds into its Df.

Now, f and s roles are swapped: s will answer clients request using Ds and f will keep an updated version of Ds. The swapping process is periodically repeated.

Notice that I omitted on purpose A LOT of details for simplicity (for example, once the swap has been done, f has to finish all the pending client requests before applying the write/delete operations received from s in the meantime).

Why do we need two servers? Because the data structure is potentially too big to fit into one memory.

Now, my question is: there is some similar approach in literature? I came up with this protocol in 10 minutes, I find strange that no (better) solution similar to this one has been already proposed!

PS: I could have forgot some application specs, don't hesitate to ask for any clarification!


Solution

  • The scheme that you have works. I don't see any particular problem with it. This is basically like many databases run their HA solution. They apply a log of writes to replicas. This model affords a great deal of flexibility in how the replicas are formed, accessed and maintained. Failovers are easy, too.

    An alternative technique is to use persistent datastructures. Each write returns you a new and independent version of the data. All versions can be read in a stable and lock-free way. Versions can be kept or discarded at will. Versions share as much of the underlying state as possible.

    Usually, trees underlie such persistent datastructures because it is easy to update a small part of the tree and reuse most of the old tree.

    A reason you might not have found a more sophisticated approach is that your problem is extremely general: You want this to work with any data structure at all and the data can be big.

    SQL Server Hekaton uses a quite sophisticated data structure to achieve lock-free, readable, point in time snapshots of any database contents. Maybe it's worth a look how they are doing it (they released a paper describing every detail of the system). They also allow for ACID transactions, serializability and concurrent writes. All lock-free.

    At the same time, f copies s updated Ds into its Df.

    This copy will take a long time because the data is big. It will block readers. A better approach is to apply the log of writes to the writable copy before accepting new writes there. That way reads can be accepted continuously.

    The switchover also is a short period where reads might have a slightly higher latency than normal.