I'm in initial stages of data modeling in Cassandra for an application. This application has existing relational persistence layer, which has to be replaced with Cassandra.
Application uses a table named login_log
for users, which provides last login time for any users in all applications.
Here's the CQL
that I'm using to create this table
create table login_log (
user_id int,
application_name text,
login_date timestamp,
primary key (user_id, application_name, login_date)
) with clustering order by (application_name asc, login_date desc)
user_id
is partition key. application_name
and login_date
are clustering keys. This table maintains a history of logins, data is never deleted in it.
I'm trying to retrieve last login date of given user_id
for all applications in one query.
If I were writing a SQL
query to do the same it would look something like below
select user_id, application_name, max(login_date) from login_log group by user_id, application_name
However it cannot be done in Cassandra, since there's no group by clause or aggregation functions. Arguably clustering columns are already grouped but I'm unable to create a query which retrieves latest login_date
for all applications in one query.
If were doing it for one application CQL
would look like below
select * from login_log where user_id = ? and application_name = ? limit 1
Since cluster are already ordered by login_date
, order by
is not needed. I need to extend the same query to retrieve data for all applications in one go.
Is it possible to do this in Cassandra? If not is there a data modeling technique that will allow me to do this?
Any hint is appreciated.
GROUP BY is supported from version 3.10 check improvement ticket here and Cassandra official documentation.