Search code examples
databasecassandranosqlconsistency

What to pick? Quorum OR Latest timestamp data in Cassandra


I was reading about Cassandra and got to know that there is Quorum concept (i.e, if there are multiple nodes/replicas where a particular key is stored, then during read operation choose and return data which has majority across those replicas) to deal with consistency during read operation.

My doubt maybe be silly but i am not able to get how Quorum concept is useful in case where we have majority data value is different that latest timestamp data. How we decide then which data value we have to return?

Ex -

for particular key "key1"

timestamp : t1>t2

5 replicas

replica0(main node) is down


replicaNo - Value - TIMESTAMP -

replica1 - value1 - t1

replica2 - value2 - t2

replica3 - value2 - t2

replica4 - value2 - t2

So in above case, what should we return majority (value2) or latest timestamp (value1)?

Can someone please help?


Solution

  • In Cassandra the last write always wins. That means that for (a1,t1) and (a2, t2) with t2>t1 value a2 will be considered the right one.

    Regarding your question, a QUORUM read on its own is not that useful. That is because in order to have full consistency, the following rule must be followed:

    RC+WC>RF
    

    (RC - read consistency; WC - Write consistency; RF - Replication factor)
    In your case (when a majority of replicas have the old data), QUORUM will increase the chance of getting the right data, but it won't guarantee it.
    The most common use case is using quorum for both read and write. That would mean that for a RF of 5, 3 nodes would have the right value. Now, if we also read from 3 nodes then there is no way that one of the 3 does not have the newer value (since at most 2 have the old value).

    Regarding how reading works, when you ask for quorum on RF of 5, the coordinator node will ask one node for the actual data and 2 nodes for a digest of that data. The coordinator node then compares the digest from the first node (the actual data) with the other 2 digests. If they match then all good the data from the first node is returned. If they are different, a read repair will be triggered, meaning that the data will be updated across all available nodes.

    So if you write with consistency one on RF of 5 not only will you risk getting old data even with quorum, but if something happens with the node that had the good data, then you could lose it altogether. Finding the balance depends on the particular use case. If in doubts, use quorum for both reads and writes.

    Hope this made sense,
    Cheers!