Search code examples
cassandrareplicationpartitioningdistributed-system

How does Cassandra partitioning work when replication factor == cluster size?


Background:

I'm new to Cassandra and still trying to wrap my mind around the internal workings.

I'm thinking of using Cassandra in an application that will only ever have a limited number of nodes (less than 10, most commonly 3). Ideally each node in my cluster would have a complete copy of all of the application data. So, I'm considering setting replication factor to cluster size. When additional nodes are added, I would alter the keyspace to increment the replication factor setting (nodetool repair to ensure that it gets the necessary data).

I would be using the NetworkTopologyStrategy for replication to take advantage of knowledge about datacenters.

In this situation, how does partitioning actually work? I've read about a combination of nodes and partition keys forming a ring in Cassandra. If all of my nodes are "responsible" for each piece of data regardless of the hash value calculated by the partitioner, do I just have a ring of one partition key?

Are there tremendous downfalls to this type of Cassandra deployment? I'm guessing there would be lots of asynchronous replication going on in the background as data was propagated to every node, but this is one of the design goals so I'm okay with it.

The consistency level on reads would probably generally be "one" or "local_one".

The consistency level on writes would generally be "two".

Actual questions to answer:

  1. Is replication factor == cluster size a common (or even a reasonable) deployment strategy aside from the obvious case of a cluster of one?
  2. Do I actually have a ring of one partition where all possible values generated by the partitioner go to the one partition?
  3. Is each node considered "responsible" for every row of data?
  4. If I were to use a write consistency of "one" does Cassandra always write the data to the node contacted by the client?
  5. Are there other downfalls to this strategy that I don't know about?

Solution

  • Do I actually have a ring of one partition where all possible values generated by the partitioner go to the one partition?

    Is each node considered "responsible" for every row of data?

    If all of my nodes are "responsible" for each piece of data regardless of the hash value calculated by the partitioner, do I just have a ring of one partition key?

    Not exactly, C* nodes still have token ranges and c* still assigns a primary replica to the "responsible" node. But all nodes will also have a replica with RF = N (where N is number of nodes). So in essence the implication is the same as what you described.

    Are there tremendous downfalls to this type of Cassandra deployment? Are there other downfalls to this strategy that I don't know about?

    Not that I can think of, I guess you might be more susceptible than average to inconsistent data so use C*'s anti-entropy mechanisms to counter this (repair, read repair, hinted handoff).

    Consistency level quorum or all would start to get expensive but I see you don't intend to use them.

    Is replication factor == cluster size a common (or even a reasonable) deployment strategy aside from the obvious case of a cluster of one?

    It's not common, I guess you are looking for super high availability and all your data fits on one box. I don't think I've ever seen a c* deployment with RF > 5. Far and wide RF = 3.

    If I were to use a write consistency of "one" does Cassandra always write the data to the node contacted by the client?

    This depends on your load balancing policies at the driver. Often we select token aware policies (assuming you're using one of the Datastax drivers), in which case requests are routed to the primary replica automatically. You could use round robin in your case and have the same effect.