Search code examples
sortingloggingcassandradatabase-performancedatabase-partitioning

How do I fetch sorted logs in Cassandra?


In my application we store logs in Cassandra. User can see the logs after giving a start and an end date for the logs. We fetch the data on the basis of these dates and have implemented pagination as well such that the end date of page one becomes the start date for page 2.

Table:

CREATE TABLE audit_trail (
    account_id bigint,
    user_id bigint,
    time timestamp,
    category int,
    ip_address text,
    action_description text,
    additional_data map<text,text>,
    source int,
    source_detail varchar,
    PRIMARY KEY ( (account_id), time )
     ) WITH CLUSTERING ORDER BY (time DESC);

Problem:

The logs we get are not sorted but scattered. For example upon hitting the query for logs of day 1 to 10 we might be getting logs for day 10,8,9,2,1, or in any other order.

Aim:

  1. to get the logs in sorted order such that logs from day 1 are shown at the top then day 2 and so on.
  2. no data shuffling. As, upon collision the table is restructured in Cassandra which might give in data we already have seen in page 1 on page 2 again.

Data throughput is large, usually around 1000 logs per hour.


Solution

  • WITH CLUSTERING ORDER BY (time DESC);

    Adding this at the end of the table solved the problem for me.