Search code examples
cassandracluster-computingreplicationdatabase-replicationconsistency

Is it possible to read data only from a single node in a Cassandra cluster with a replication factor of 3?


I know that Cassandra have different read consistency levels but I haven't seen a consistency level which allows as read data by key only from one node. I mean if we have a cluster with a replication factor of 3 then we will always ask all nodes when we read. Even if we choose a consistency level of one we will ask all nodes but wait for the first response from any node. That is why we will load not only one node when we read but 3 (4 with a coordinator node). I think we can't really improve a read performance even if we set a bigger replication factor.

Is it possible to read really only from a single node?


Solution

  • Are you using a Token-Aware Load Balancing Policy?

    If you are, and you are querying with a consistency of LOCAL_ONE/ONE, a read query should only contact a single node.

    Give the article Ideology and Testing of a Resilient Driver a read. In it, you'll notice that using the TokenAwarePolicy has this effect:

    "For cases with a single datacenter, the TokenAwarePolicy chooses the primary replica to be the chosen coordinator in hopes of cutting down latency by avoiding the typical coordinator-replica hop."

    So here's what happens. Let's say that I have a table for keeping track of Kerbalnauts, and I want to get all data for "Bill." I would use a query like this:

    SELECT * FROM kerbalnauts WHERE name='Bill';
    

    The driver hashes my partition key value (name) to the token of 4639906948852899531 (SELECT token(name) FROM kerbalnauts WHERE name='Bill'; returns that value). If I am working with a 6-node cluster, then my primary token ranges will look like this:

    node   start range              end range
    1)     9223372036854775808 to  -9223372036854775808
    2)    -9223372036854775807 to  -5534023222112865485
    3)    -5534023222112865484 to  -1844674407370955162
    4)    -1844674407370955161 to   1844674407370955161
    5)     1844674407370955162 to   5534023222112865484
    6)     5534023222112865485 to   9223372036854775807
    

    As node 5 is responsible for the token range containing the partition key "Bill," my query will be sent to node 5. As I am reading at a consistency of LOCAL_ONE, there will be no need for another node to be contacted, and the result will be returned to the client...having only hit a single node.

    Note: Token ranges computed with:

    python -c'print [str(((2**64 /5) * i) - 2**63) for i in range(6)]'