Search code examples
cassandracql

How to get data in descending order of Timestame in cassandra


I have existing data table in cassandra date table with primary key is id

SELECT * FROM Op_History ORDER BY create_time DESC limit 100;

I tried this one but getting error as : InvalidRequest: Error from server: code=2200 [Invalid query] message="ORDER BY is only supported when the partition key is restricted by an EQ or an IN.


Solution

  • primary key is id

    So with Cassandra, you need to design your tables to support a specific query. With the PRIMARY KEY being id, really the only query it will support is each individual row by the id.

    I'd recommend building a query table for that data like this:

    CREATE TABLE op_history (
        id UUID,
        create_time TIMESTAMP,
        day_bucket INT,
        op_data TEXT,
        PRIMARY KEY ((day_bucket),create_time,id))
    WITH CLUSTERING ORDER BY (create_time DESC, id ASC);
    

    By partitioning on day_bucket, I'm ensuring that all data for a specific day is stored together. I'm not sure about the business case behind op_history, but if you need to query for an entire month's worth of data, then you would use something like month_bucket instead.

    Now, I can filter on rows for a specific day:

    > SELECT * FROM op_history WHERE day_bucket=20221221;
    
     day_bucket | create_time                     | id                                   | op_data
    ------------+---------------------------------+--------------------------------------+---------
       20221221 | 2022-12-21 14:42:58.552000+0000 | 59b0a30b-213b-4847-bd3e-134a641be21f | Hello4!
       20221221 | 2022-12-21 14:42:56.057000+0000 | 7148d5b3-77d7-4088-8c6d-f2e4c73175f2 | Hello3!
       20221221 | 2022-12-21 14:42:53.866000+0000 | b23f4556-2a72-4014-a6e9-7a2ceb55217c | Hello2!
       20221221 | 2022-12-21 14:42:47.738000+0000 | 51d09afa-806e-4bec-b6bf-94eb1a67910d |  Hello!
    
    (4 rows)
    

    With the CLUSTERING ORDER defined, I won't need an ORDER BY clause.

    As I dont have chance to change table creation

    Oh, I'm not suggesting that. I'm suggesting that you create a new table with a different primary key definition, and load the same data into it. That's actually the best practice for data modeling in Cassandra.

    is there any possibilities i.e.. ALLOW FILTERING

    So using the ALLOW FILTERING directive is generally considered to be "bad practice," because it consumes too many resources. If the query has to talk to too many nodes, it could likely time out or even crash the coordinator node. Also, ALLOW FILTERING still won't let an ORDER BY to be applied to it.

    One thing a lot of teams end up doing is building a Spark cluster to work with Cassandra data. Spark can pull data from Cassandra and work with it in RAM to perform ANSI-compliant SQL operations on it. That would allow you to apply an ORDER BY.

    On the other hand, you could try ALLOW FILTERING and then perform the sort on the application side. Definitely not ideal.