I'm new on cassandra data modeling, I realy need same advice, here is my problem:
I need to create a new column family that will allow me to store and retrieve last inserted scores :
CREATE TABLE average_score(
audit_time timestamp PRIMARY KEY,
pages_count int,
score float,
)
The inserted data is not sorted according to primary key (i'm using a random partinioner(default)), do you have any solution please ? Can I specify a different partitionner just for this family column ?
thanks
Here is an example of an hour-partitioned series table that might clarify some things for you:
CREATE TABLE average_score(
hour timestamp,
audit_time timeuuid,
pages_count int,
score float,
PRIMARY KEY (hour, audit_time)
)
WITH CLUSTERING ORDER BY (audit_time DESC)
Because it comes first, hour
is our "partition" key, i.e. it will be used to physically distribute our data across the cluster. (When you write, you will have to supply this value, rounded down to the start of the current hour.)
audit_time
is our first "clustering" key, i.e. it is used to order and identify rows in a given hour
partition on a particular node. We've chosen timeuuid
to prevent overwrites. (You can pull out the actual time with the dateOf
function. See http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/timeuuid_functions_r.html)
WITH CLUSTERING ORDER BY (audit_time DESC)
directs C* to store rows within a partition in descending order on disk, which is probably the right decision if you intend on using ORDER BY audit_time DESC
in most of your queries. (See http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html)
Caveat: Although we've partitioned the data fairly granularly, you will still have a bit of a write hotspot if you're just appending new scores as they are registered.
Cheers!
P.S. If you're still stuck, the DataStax CQL documentation is a great resource for data modeling help.