Search code examples
cassandrareplicationshardinghorizontal-scalingnosql

NoSQL quorum comparing to virtual sharding


After reading about few NoSQL techniques it looks for me that Quorum fails comparing with Virtual Sharding. Virtual sharding allow scalability and does not increase amount of read/writes across a system. What's also bad that I absolutely can't find any benefit of quorum over sharding.

Question: May you act as an advocate of quorum technique from perspective of data consistency/performance/scalability and bring a light to situations where it's better than sharding?

Below is my vision of the stuff:

Quorum:

Suppose I have a booking system which demand high data consistency. As one of approaches with NoSQL to achieve data consistency is quorum, means R + W > N, where R - read nodes, W - write nodes and N - total amount of nodes.

As I understand, if you use quorum than to write a row your db need to perform a write operation W times. Also to read something your db need to do R reads. Right?

Virtual sharding:

As I understand, sharding - it's when there's something similar to hashmap, which by some criteria tells you where income data should be stored / from where should be read. Suppose you have N nodes. Virtual means that in order to avoid scalability problems, that hash map would be bigger than N, but suppose 10*N. That allow easily reconfigure it on adding new nodes.

What is extremely good about it that it doesn't demand any replication like quorum! Of course in sake of availability/failover you can bring one master-slave backup for each node. But that won't increase amount of read/writes in a system.


Solution

  • The key differentiation that needs to be made here is that 'quorum' is a concept employed for eventual consistency among replicas in a partition, where 'sharding' is a concept for data partitioning and does not imply replication.

    In a system like cassandra, replication is not a requirement. You could use cassandra for data partitioning/sharding only, assigning tokens to your nodes to establish ownership of data in the ring. Cassandra uses a concept called consistent hashing for distributing data across nodes in your cluster.

    Quorum is an available consistency level when reading and writing data to cassandra. When you write to cassandra, all replicas receive and process the write request regardless of the consistency level that is used. However, cassandra will respond to the request as soon as enough replicas have successfully processed the write to meet the consistency level. For reads the process is somewhat different in that all nodes will create a digest over the data, while only enough nodes to meet the consistency level will perform the read (in the normal case).

    As you indicate, without having multiple replicas, availability is a problem. If you had a master-slave configuration for each shard in your example, you are effectively writing the data twice. It depends on the database solution and configuration with regards to whether or not the database responds to the write when the master processes the write or if the write to the slave needs to be completed as well.

    Cassandra excels in both partitioning/sharding and replication. The same is true for other AP nosql solutions. Also, since cassandra supports tunable consistency via consistency levels, this allows you to find an ideal balance between availability and consistency in your application. By using a quorum consistency level you can survive the loss of replicas (i.e. with 3 replicas, you could survive the loss of 1 node in a partition) while your application continues to work.

    The advantage of replication using quorum consistency (or any other consistency for that matter) in cassandra over sharding+backing up in another solution is that if the master of a shard/partition fails, that partition is unavailable until the backup becomes active. In an AP system (like cassandra) on a replica failure the system continues working without issue as long as the consistency level is met. There is no need for an 'active-passive switchover' which can often not be transparent (really depends on the database solution). Additionally, if you have a high enough replication factor, you can support the loss of multiple nodes in a partition (i.e. using QUORUM with an RF of 5 nodes allows you to lose 2 nodes in a partition). Lastly another advantange is that since you can have many active replicas within a partition, they can all serve requests simultaneously, while in a Master-Slave setup, only the Master services reads/writes. This could lead to much better performance at scale.