Search code examples
cassandradatamodel

Cassandra data model for querying column with non-unique values


I am trying to come up with a data model that works for my situation. I have the following columns: runid, stat1, stat2, stat3.

I will be querying based on runid (ex. select * from table where runid=123) but runid will have repeated values so I cant just make it a primary key. Also, when querying I do not know anything about stat1-3 so I cannot make those columns a part of the primary key either.

(Background: My Cassandra instance is populated by another program and my program will just pull information specific to a particular runid from it and display it on a screen)

This seems like a common enough scenario but I'm new to Cassandra. I know runid needs to be a part of the primary key since I have to run queries based on it but it has repeated values.

Any suggestions?


Solution

  • You are correct in thinking this is a common enough scenario and fortunately you don't have to know the full primary key to make queries! Just the partition key (the first part of the primary key is required in your select criteria.

    For example, if you create your table like this:

    CREATE TABLE test.runs (
        stat1 text,
        stat2 text,
        stat3 text,
        runid int,
        PRIMARY KEY (runid, stat1)
     );
    

    You should be able to retrieve data by just specifying the partition key (runid):

    cassandra@cqlsh:test> insert into runs (runid, stat1, stat2, stat3) values (0, '1', 'hi', 'hi'); 
    cassandra@cqlsh:test> insert into runs (runid, stat1, stat2, stat3) values (0, '2', 'lo', 'lo');
    cassandra@cqlsh:test> insert into runs (runid, stat1, stat2, stat3) values (0, '3', 'yo', 'yo');
    cassandra@cqlsh:test> insert into runs (runid, stat1, stat2, stat3) values (1, '1', '22', '33');
    cassandra@cqlsh:test> select * from runs where runid = 0;
    
     runid | stat1 | stat2 | stat3
    -------+-------+-------+-------
         0 |     1 |    hi |    hi
         0 |     2 |    lo |    lo
         0 |     3 |    yo |    yo
    

    That being said, it would probably be good to come up with a better secondary value for your primary key to allow multiple 'stat1' columns to have the same value within a runid, maybe a random uuid to create some unique value?