Search code examples
cassandraprimary-keydata-modelingclustering-key

Cassandra Data Modelling and designing the Clustering


I am little confused on designing the data model for Cassandra, coming from SQL background! I have gone through Datastax documentation several times to understand many things about Cassandra! This seems to be problem and not sure how can I overcome this and type of data model which I should opt for!

Primary Key along with Clustering is something really explained well here! The documentation says that, Primary Key (Partition key, Clustering keys) is the most important thing in data model.

My use-case is pretty simple:

ITEM_ID    CREATED_ON     MOVED_FROM     MOVED_TO   COMMENT

ITEM_ID will be unique (partition_key) and each item might have 10-20 movement records! I wanted to get the movement records of an item sorted by time it's created on. So I decided go with CREATED_ON as clustering key.

According to documentation, clustering_key comes under secondary index which should be as much repeatable value as possible unlike partition key. My data-model exactly fails here! How do I preserve order using clustering to achieve the same?

Obviously I can't create some ID generation login in Application since it runs on many instances and if I have to relay on some logic, eventually the purpose of Cassandra goes for toss here.


Solution

  • You actually do not need a secondary index for this particular example and secondary indexes are not created by default. Your clustering key all by itself will will allow you to do queries that look like

    SELECT * from TABLE where ITEM_ID = SOMETHING; 
    

    Which will automatically give you back results sorted on your clustering key CREATED_ON.

    The reason for this is your key will basically make partitions internally that looks like

    ITEM_ID => [Row with first Created_ON], [Row with second Created_ON] ...