Search code examples
cassandraprimary-keycqlperformancecounter

How to find range in Cassandra Primary key?


Use case: Find maximum counter value in a specific id range

I want to create a table with these columns: time_epoch int, t_counter counter

The frequent query is:

select time_epoch, MAX t_counter where time_epoch >= ... and time_epoch < ...

This is to find the counter in specific time range. Planning to make time_epoch as primary key. I am not able to query the data. It is always asking for ALLOW FILTERING. Since its a very costly function, We dont want to use it.

How to design the table and query for the use case.


Solution

  • Let's assume that we can "bucket" (partition) your data by day, assuming that enough write won't happen in a day to make the partitions too large. Then, we can cluster by time_epoch in DESCending order. With time based data, storing data in descending order often makes the most sense (as business reqs usually care more about the most-recent data).

    Therefore, I'd build a table like this:

    CREATE TABLE event_counter (
        day bigint,
        time_epoch timestamp,
        t_counter counter,
        PRIMARY KEY(day,time_epoch))
    WITH CLUSTERING ORDER BY (time_epoch DESC);
    

    After inserting a few rows, the clustering order becomes evident:

    > SELECT * FROM event_counter ;
        WHERE day=20210219 
          AND time_epoch>='2021-02-18 18:00'
          AND time_epoch<'2021-02-19 8:00';
    
     day      | time_epoch                      | t_counter
    ----------+---------------------------------+-----------
     20210219 | 2021-02-19 14:09:21.625000+0000 |         1
     20210219 | 2021-02-19 14:08:32.913000+0000 |         2
     20210219 | 2021-02-19 14:08:28.985000+0000 |         1
     20210219 | 2021-02-19 14:08:05.389000+0000 |         1
    
    (4 rows)
    

    Now SELECTing the MAX t_counter in that range should work:

    > SELECT day,max(t_counter) as max
    FROM event_counter
    WHERE day=20210219
      AND time_epoch>='2021-02-18 18:00'
      AND time_epoch<'2021-02-19 09:00';
    
     day      | max
    ----------+-----
     20210219 |   2