Search code examples
cassandraschema-design

Cassandra schema design sorted by time


I'm new on cassandra data modeling, I realy need same advice, here is my problem:

I need to create a new column family that will allow me to store and retrieve last inserted scores :

CREATE TABLE average_score(
    audit_time timestamp PRIMARY KEY,
    pages_count int,
    score float,
)

The inserted data is not sorted according to primary key (i'm using a random partinioner(default)), do you have any solution please ? Can I specify a different partitionner just for this family column ?

thanks


Solution

  • Here is an example of an hour-partitioned series table that might clarify some things for you:

    CREATE TABLE average_score(
        hour timestamp,
        audit_time timeuuid,
        pages_count int,
        score float,
        PRIMARY KEY (hour, audit_time)
    )
    WITH CLUSTERING ORDER BY (audit_time DESC)
    
    • Because it comes first, hour is our "partition" key, i.e. it will be used to physically distribute our data across the cluster. (When you write, you will have to supply this value, rounded down to the start of the current hour.)

    • audit_time is our first "clustering" key, i.e. it is used to order and identify rows in a given hour partition on a particular node. We've chosen timeuuid to prevent overwrites. (You can pull out the actual time with the dateOf function. See http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/timeuuid_functions_r.html)

    • WITH CLUSTERING ORDER BY (audit_time DESC) directs C* to store rows within a partition in descending order on disk, which is probably the right decision if you intend on using ORDER BY audit_time DESC in most of your queries. (See http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html)

    Caveat: Although we've partitioned the data fairly granularly, you will still have a bit of a write hotspot if you're just appending new scores as they are registered.

    Cheers!

    P.S. If you're still stuck, the DataStax CQL documentation is a great resource for data modeling help.