Search code examples
cassandracqlcql3

Cassandra Schema for retrieving date-ordered records


Folks, I would like to solve the following with one table in Cassandra. Said service tracks when users open an asset. On subsequent events to the same asset, we simply over-write the accessDate.

example record:

{ userId: "string", assetId: "string", accessDate: unixTimestamp }

With this said, we need to fulfill the following access requirements (each requirement has its own bulletpoint for readability):

  • Be able to return all assets a user has opened, and at what time.

This is easy to achieve, table could look like:

CREATE TABLE user_assets_tracker (
   userId uuid,
   accessDate timestamp,
   assetId uuid,
   PRIMARY KEY (userid, accessDate, assetId)
);

This allows us to query for all assets, and when each was last accessed.

SELECT *
FROM user_assets_tracker
WHERE userId = 522b1fe2-2e36-4cef-a667-cd4237d08b89
ORDER BY accessDate DESC;
>

Dandy. Now the harder bits, which I am unsure about, was hoping you folks could chime in:

  • Show me all the assets user added in the past 30 days.

Naturally the LIMIT here is not what we need. Also, we may need to have 2 tables to achieve this.

SELECT *
FROM user_assets_tracker
WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89
ORDER BY accessDate DESC;
LIMIT 10; ?????  
  • Show me the last accessed item for the user. I think this one is easier, the LIMIT 1 solves that.

This is probably straight forward, with this schema:

CREATE TABLE user_assets_tracker (
   userId uuid,
   accessDate timestamp,
   assetId uuid,
   PRIMARY KEY (userid, accessDate, assetId)
);

SELECT *
FROM user_assets_tracker
WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89
ORDER BY accessDate DESC;
LIMIT 1;
  • Retrieve the full record for a particular userId + assetId

Since accessDate comes before assetId in our schema, I am not sure how to do this as well. Another table?

Thanks!!

PS It seems that SASI Index could be the solution


Solution

  • Though you are always selecting assetid orderby accessDate desc.
    Define your schema with order by accessDate desc

    CREATE TABLE user_assets_tracker (
        userid uuid,
        accessdate timestamp,
        assetid uuid,
        PRIMARY KEY (userid, accessdate, assetid)
    ) WITH CLUSTERING ORDER BY (accessdate DESC, assetid ASC);
    

    Now you don't need to specify order by accessDate desc every time. it will by default order your data by accessDate desc

    • Show me all the assets user added in the past 30 days.

    First get timestamp of 30 day ago.
    Let's current timestamp of 30 day ago is : 2017-02-05 12:00:00+0000
    Now you can query :

    SELECT * FROM user_assets_tracker WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89 AND accessdate >= '2017-02-05 12:00:00+0000'
    
    • Retrieve the full record for a particular userId + assetId

    If you are using Cassandra 3.0 or above you can use Materialized Views
    CREATE a Materialized View :

    CREATE MATERIALIZED VIEW user_assets AS
        SELECT *
        FROM user_assets_tracker
        WHERE userid IS NOT NULL AND assetid IS NOT NULL AND accessdate IS NOT NULL
        PRIMARY KEY (userid, assetid, accessdate);
    

    Now if you want to get all data with userid and assetid, here is the query

    SELECT * FROM user_assets WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89 AND assetid = 1d45e6c2-02a1-11e7-aac5-b9ab92bee74c;
    

    Here is another thing, if huge data is inserted into a single user, you should add time bucket with userid as partition key.For more check the answer https://stackoverflow.com/a/41857183/2320144