Search code examples
cassandracassandra-2.0clustered-indexnosql

Cassandra - alternate way for clustering key with ORDER BY and UPDATE


My schema is :

CREATE TABLE friends (
     userId timeuuid,
     friendId timeuuid,
     status varchar, 
     ts timeuuid,   
     PRIMARY KEY (userId,friendId)
);

CREATE TABLE friends_by_status (
    userId timeuuid,
    friendId timeuuid,
    status varchar, 
    ts timeuuid,   
    PRIMARY KEY ((userId,status), ts)
)with clustering order by (ts desc);

Here, whenever a friend-request is made, I'll insert record in both tables. When I want to check one to one status of users, i'll use this query:

SELECT status FROM friends WHERE userId=xxx AND friendId=xxx;

When I need to query all the records with pending status, i'll use :

SELECT * FROM friends_by_status WHERE userId=xxx AND status='pending';

But, when there is a status change, I can update the 'status' and 'ts' in the 'friends' table, but not in the 'friends_by_status' table as both are part of PRIMARY KEY.

You could see that even if I denormalise it, I definitely need to update 'status' and 'ts' in 'friends_by_status' table to maintain consistency.

Only way I can maintain consistency is to delete the record and insert again.
But frequent delete is also not recommended in cassandra model. As said in Cassaandra Spottify summit.

I find this as the biggest limitation in Cassandra.

Is there any other way to sort this issue.

Any solution is appreciated.


Solution

  • I don't know how soon you need to deploy this, but in Cassandra 3.0 you could handle this with a materialized view. Your friends table would be the base table, and the friends_by_status would be a view of the base table. Cassandra would take care updating the view when you changed the base table.

    For example:

    CREATE TABLE friends ( userid int, friendid int, status varchar, ts timeuuid, PRIMARY KEY (userId,friendId) );
    CREATE MATERIALIZED VIEW friends_by_status AS
        SELECT userId from friends WHERE userID IS NOT NULL AND friendId IS NOT NULL AND status IS NOT NULL AND ts IS NOT NULL
        PRIMARY KEY ((userId,status), friendID);
    
    INSERT INTO friends (userid, friendid, status, ts) VALUES (1, 500, 'pending', now());
    INSERT INTO friends (userid, friendid, status, ts) VALUES (1, 501, 'accepted', now());
    INSERT INTO friends (userid, friendid, status, ts) VALUES (1, 502, 'pending', now());
    SELECT * FROM friends;                
    
     userid | friendid | status   | ts
    --------+----------+----------+--------------------------------------
          1 |      500 |  pending | a02f7fe0-49f9-11e5-9e3c-ab179e6a6326
          1 |      501 | accepted | a6c80980-49f9-11e5-9e3c-ab179e6a6326
          1 |      502 |  pending | add10830-49f9-11e5-9e3c-ab179e6a6326
    

    So now in the view you can select rows by the status:

    SELECT * FROM friends_by_status WHERE userid=1 AND status='pending';
    
     userid | status  | friendid
    --------+---------+----------
          1 | pending |      500
          1 | pending |      502
    
    (2 rows)
    

    And then when you update the status in the base table, it automatically updates in the view:

    UPDATE friends SET status='pending' WHERE userid=1 AND friendid=501;
    SELECT * FROM friends_by_status WHERE userid=1 AND status='pending';
    
     userid | status  | friendid
    --------+---------+----------
          1 | pending |      500
          1 | pending |      501
          1 | pending |      502
    
    (3 rows)
    

    But note that in the view you couldn't have ts as part of the key, since you can only add one non-key field from the base table as part of the key in the view, which in your case would be adding 'status' to the key.

    I think the first beta release for 3.0 is coming out tomorrow if you want to try this out.