Cassandra Schema for retrieving date-ordered records

Folks, I would like to solve the following with one table in Cassandra. Said service tracks when users open an asset. On subsequent events to the same asset, we simply over-write the accessDate.

example record:

{ userId: "string", assetId: "string", accessDate: unixTimestamp }

With this said, we need to fulfill the following access requirements (each requirement has its own bulletpoint for readability):

Be able to return all assets a user has opened, and at what time.

This is easy to achieve, table could look like:

CREATE TABLE user_assets_tracker (
   userId uuid,
   accessDate timestamp,
   assetId uuid,
   PRIMARY KEY (userid, accessDate, assetId)
);

This allows us to query for all assets, and when each was last accessed.

SELECT *
FROM user_assets_tracker
WHERE userId = 522b1fe2-2e36-4cef-a667-cd4237d08b89
ORDER BY accessDate DESC;
>

Dandy. Now the harder bits, which I am unsure about, was hoping you folks could chime in:

Show me all the assets user added in the past 30 days.

Naturally the LIMIT here is not what we need. Also, we may need to have 2 tables to achieve this.

SELECT *
FROM user_assets_tracker
WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89
ORDER BY accessDate DESC;
LIMIT 10; ?????

Show me the last accessed item for the user. I think this one is easier, the LIMIT 1 solves that.

This is probably straight forward, with this schema:

CREATE TABLE user_assets_tracker (
   userId uuid,
   accessDate timestamp,
   assetId uuid,
   PRIMARY KEY (userid, accessDate, assetId)
);

SELECT *
FROM user_assets_tracker
WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89
ORDER BY accessDate DESC;
LIMIT 1;

Retrieve the full record for a particular userId + assetId

Since accessDate comes before assetId in our schema, I am not sure how to do this as well. Another table?

Thanks!!

PS It seems that SASI Index could be the solution

Solution

Though you are always selecting assetid orderby accessDate desc.
Define your schema with order by accessDate desc

CREATE TABLE user_assets_tracker (
    userid uuid,
    accessdate timestamp,
    assetid uuid,
    PRIMARY KEY (userid, accessdate, assetid)
) WITH CLUSTERING ORDER BY (accessdate DESC, assetid ASC);

Now you don't need to specify order by accessDate desc every time. it will by default order your data by accessDate desc

Show me all the assets user added in the past 30 days.

First get timestamp of 30 day ago.
Let's current timestamp of 30 day ago is : 2017-02-05 12:00:00+0000
Now you can query :

SELECT * FROM user_assets_tracker WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89 AND accessdate >= '2017-02-05 12:00:00+0000'

Retrieve the full record for a particular userId + assetId

If you are using Cassandra 3.0 or above you can use Materialized Views
CREATE a Materialized View :

CREATE MATERIALIZED VIEW user_assets AS
    SELECT *
    FROM user_assets_tracker
    WHERE userid IS NOT NULL AND assetid IS NOT NULL AND accessdate IS NOT NULL
    PRIMARY KEY (userid, assetid, accessdate);

Now if you want to get all data with userid and assetid, here is the query

SELECT * FROM user_assets WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89 AND assetid = 1d45e6c2-02a1-11e7-aac5-b9ab92bee74c;

Here is another thing, if huge data is inserted into a single user, you should add time bucket with userid as partition key.For more check the answer https://stackoverflow.com/a/41857183/2320144