I need to create a Cassandra column family with following attributes.
id bigint,
content varchar,
year int,
frequency int,
I want to get the content with highest frequency in a given year using this column family. Also when inserting data to table, for given content and year, I need to check if an id already exist or not. How can I achieve this with Cassandra?
I tried creating CF using
CREATE TABLE sinmin.word_time_inv_frequency (
id bigint,
content varchar,
year int,
frequency int,
PRIMARY KEY((year), frequency)
);
and then retrieved data using
SELECT id FROM word_time_inv_frequency WHERE year = 2010 ORDER BY frequency ;
But when using this, I can't check if entry is already existing for the (content,year) pair in the CF.
You can use a compound partition key to be able to select by (content, year) and still be able to order by frequency:
with this table you'd be able to
create table test2 (
id bigint,
content varchar,
year int,
frequency int,
PRIMARY KEY((year, content), frequency)
);
Your query would work as follows:
select * from test2 where content ='puppies' and year=2014 order by frequency ;
It is best practice to maintain different tables to address querying needs. You can look into some of the integrations in datastax enterpise (search/analytics) for ad-hoc queries.
Please check the following videos on data modeling for an in-depth look at c* data modeling: https://www.youtube.com/playlist?list=PL75iJfNDd0_FI-Ia_b4z8aoDdOCQTi35I
Also consider using Clustering Order By in your table definition if you are always going to be pulling data in the same order.