cassandra consistent-hashing coordination

Classic Cassandra and Coordination

I am curious about coordination in classic Cassandra. I read the Facebook paper written by Avinash Lakshman and Prashant Malik called Cassandra - A Decentralized Structured Storage System

An excerpt from the paper Each data item identified by a key is assigned to a node by hashing the data item’s key to yield its position on the ring, and then walking the ring clockwise to find the first node with a position larger than the item’s position. This node is deemed the coordinator for this key. The application specifies this key and the Cassandra uses it to route requests. Thus, each node becomes responsible for the region in the ring between it and its predecessor node on the ring.

The part I am curious about is the last node in the ring, the one that points to the 1st node in the ring, and what range is it coordinating?

Thus, each node becomes responsible for the region in the ring between it and its predecessor node on the ring

I am trying to visualize the coordination scheme like so:

Question

Not sure how every node can be a coordinator though if, according to the description that each node is responsible for itself and its preceding node because then you would have coordinators overlapping. So in my screenshot 180,302, 502 and 771 would overlap if they were also coordinators.

Solution

The coordinator is really whoever you send the request to. Many strategies on the drivers will keep the ring data and send the request to one of the replicas, that way if consistency level is set to ONE it can do it all on that one host and remove the latency of another network hop from the request. Really you can send a request to any node in the ring and it will just mean an extra network hop (which may be required anyway if using stronger consistency).

Thing about a ring is there is no "last node" it wraps around. from any part of that ring you can go around clockwise and pick the other replicas. Consistent hashing is used in a lot of different places, if the wording has you confused try a different source (ie a presentation).

Terminology and concepts in Cassandra have changed quite a bit since the early days so when reading paper remember that it may not match up with how Cassandra works today.

The ring is a visualization. The actual implementation is more like having a list of tokens. Think:

[(a, 4), (b, 10), (c, 35), (d, 40)]

for a range of 1-50. Find the first token greater than your token the list, then continue down list until you have enough replicas to satisfy the replication factor. With a RF of 3 and a token of 6 you start with b since its first one greater, then include the next 2 so your replicas are [b, c, d]. No replica is more important than others or has any special control over the data (other then repairs). The "wrapping" around at end of the list is just simply that, a token of 41 goes to [a, b, c].