Search code examples
cassandracql3cassandra-2.0

Placing data in specific nodes in Cassandra


In Cassandra, can we "fix" the node in which a specific partition key resides to optimize fetches?

This is optimization for a specific keyspace and table where data written by one data center is never read by clients on a different data center. If a particular partition key will be queried only in specific data center, is it possible to avoid network delays by "fixing" it to nodes of same data center where it was written?

In other words, this is a use case where the schema is common across all data centers, but the data is never accessed across data centers. One way of doing this is to make the data center id as the partition key. However, a specific data center's data need/should not be placed in other data centers. Can we optimize by somehow specifying cassandra the partition key to data center mapping?

Is a custom Partitioner the solution for this kind of use case?


Solution

  • Data is too volumninous to be replicated across all data centers. Hence I am resorting to creating a keyspace per data center.

    CREATE KEYSPACE "MyLocalData_dc1"
    WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 1, dc3:0, dc4: 0};
    
    CREATE KEYSPACE "MyLocalData_dc2"
    WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 0, 'dc2' : 3, dc3:1, dc4: 0};
    

    This way, MyLocalData generated by datacenter 1 has one backup in datacenter 2. And data generated by datacenter2 is backed up in data center 3. Data is "fixed" in the data center it is written in and accessed from. Network latencies are avoided.