Search code examples
javacassandracqlcql3nosql

How would one model somthing updateable, say like a “status” of a thing in Cassandra CQL3 and be able to query on this status?


This is a bit of a contrived example to illustrate my question, but let's say I have a Car entity which contains Lightbulb entities. A car has several lightbulbs, each of which could be "on", "off" or "broken".

Each type of lightbulb has a unique id. (left headlight = 100, right headlight = 101... that sort of thing)

The status of a lightbulb needs to be constantly updated.

What I'd like to do is query for a specific car for a set of lightbulbs with a specific status.

something like: "give me all the lightbulbs with status "on" for car "chevy" model "nova" vin "xyz-123"".

create table lightbulbstatus (
   bulbid uuid,
   carmake text,
   carmodel text,
   carvin uuid,
   lastupdate timestamp,
   status int,  
                   /* row key *                /* col keys  */
   PRIMARY KEY( (carmake, carmodel, carvin), ?   ?    ?    ?)
);

I believe the row key should have the car coordinate in it, but beyond that, I'm a bit lost. I assume each time there is a status change to a bulb, we add a column. But I'm not sure what the keys should be in the column to make the query work.

I think in RDBMS-land, you could do a subselected or nested query to find bulbs with the status = on.

select * from lightbulbstatus where status = 1 and lastupdate > (select lastupdate from lightbulbstatus where status != 1);

No idea how you would do this in CQL3. Obviously sub-selects are not allowed.


Solution

  • Since you do not have to maintain status history, I would suggest to have a single row for each bulb by the following primary key:

    PRIMARY KEY( (carmake, carmodel, carvin), bulbid)
    

    In order to query lightbulbs by status you need to create a secondary index:

    CREATE INDEX lightbulb_by_status ON lightbulbstatus (status);
    
    SELECT * FROM lightbulbstatus 
      WHERE status = 1 
        AND carmake = 'chevy' 
        AND carmodel = 'nova'
        AND carvin = cfe638e9-5cd9-43c2-b5f4-4cc9a0e6b0ff;
    

    Although cardinality of the status is low, your query includes the partition key and is highly efficient.
    If the number of rows to be filtered is very small (like number of lightbulbs in a car), you may consider to filter lightbulbs by status in the application (and skip the secondary index).

    If you should handle a case that an obsolete lightbulb status update might override a more recent status update (as your RDBMS query suggests), consider using lightweight transactions:

    UPDATE lightbulbstatus set status = 0, lastupdate = '2014-11-08 23:50:30+0019'
      WHERE carmake = 'chevy' 
        AND carmodel = 'nova' 
        AND carvin = cfe638e9-5cd9-43c2-b5f4-4cc9a0e6b0ff 
        AND bulbid = 9124f318-8253-4d94-b865-3be07899c8ff 
      IF status = 1 AND lastupdate < '2014-11-08 23:50:30+0019';
    

    Hope it helps.