I'm trying to model a table in Cassandra, I'm quite new and stumbled upon one problem. I've got the following:
CREATE TABLE content_registry (
service text,
file text,
type_id tinyint,
container text,
status_id tinyint,
source_location text,
expiry_date timestamp,
modify_date timestamp,
create_date timestamp,
to_overwrite boolean,
PRIMARY KEY ((service), file, type_id)
);
So as I understand:
service
is my partition key and based on this value hashes will be generated and values will be split in clusterfile
is clustering keytype_id
is clustering keyWhat I've figured out is that whenever I'll insert new data, Cassandra will upsert (either insert or update if the value with that compound primary key exists)
Now what I'm struggling is, that I want my data to come back sorted by create_date
in descending order, however create_date
is not part of primary key.
If I add create_date
to my primary key, I won't be able to upsert data, because create_date
means timestamp when record was inserted, so if I add it to primary key every time there's an insert, I'll end up with multiple records.
What are the other options? Order in application? That doesn't seem very efficient.
What I've figured out is that whenever I'll insert new data, Cassandra will upsert (either insert or update if the value with that compound primary key exists)
Totally right.
Now what I'm struggling is, that I want my data to come back sorted by create_date in descending order, however create_date is not part of primary key. If I add create_date to my primary key, I won't be able to upsert data, because create_date means timestamp when record was inserted, so if I add it to primary key every time there's an insert, I'll end up with multiple records.
With these sentences you are actually contradicting.
If create_date
isn't part of your key but a property and the data is upserted, it means that the records are always the same. Therefore when querying by the key and fetching create_date
you always have the latest. If you actually want to have the date when the record got created you should just not override the data anymore after the first time you inserted that record.
If it's the case you want to represent a series of data, you indeed need to avoid upserting, this is could be done by using create_date
as additional partition key. I'd rather prefeer using time_uuid
which comes with quite handy functions.
Last but not least, the most interesting question is, what actually the usecase is that you want to reflect. When modelling data in cassandra you always should know your queries you need to run in advance.