Search code examples
selectcassandracql

How to select last timestamp by distinct columns?


Suppose there is table like this:

| user_id | location_id | datetime            | other_field |
| ------- | ----------- | ------------------- | ----------- |
| 12      | 1           | 2020-02-01 10:00:00 | asdqwe      |
| 12      | 1           | 2020-02-01 10:30:00 | asdqwe      |
| 12      | 2           | 2020-02-01 10:40:00 | asdqwe      |
| 12      | 2           | 2020-02-01 10:50:00 | asdqwe      |
| 13      | 1           | 2020-02-01 10:10:00 | asdqwe      |
| 13      | 1           | 2020-02-01 10:20:00 | asdqwe      |
| 14      | 3           | 2020-02-01 09:00:00 | asdqwe      |

I want to select last datetime of each distinct user_id and location_id. This is what result I am looking for:

| user_id | location_id | datetime            | other_field |
| ------- | ----------- | ------------------- | ----------- |
| 12      | 1           | 2020-02-01 10:30:00 | asdqwe      |
| 12      | 2           | 2020-02-01 10:50:00 | asdqwe      |
| 13      | 1           | 2020-02-01 10:20:00 | asdqwe      |
| 14      | 3           | 2020-02-01 09:00:00 | asdqwe      |

Here is the table description:

CREATE TABLE mykeyspace.mytable (
    user_id int,
    location_id int,
    datetime timestamp,
    other_field text,
    PRIMARY KEY ((user_id, location_id, other_field), datetime)
) WITH CLUSTERING ORDER BY (datetime ASC)
    AND read_repair_chance = 0.0
    AND dclocal_read_repair_chance = 0.1
    AND gc_grace_seconds = 864000
    AND bloom_filter_fp_chance = 0.01
    AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' }
    AND comment = ''
    AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold' : 32, 'min_threshold' : 4 }
    AND compression = { 'chunk_length_in_kb' : 64, 'class' : 'org.apache.cassandra.io.compress.LZ4Compressor' }
    AND default_time_to_live = 0
    AND speculative_retry = '99PERCENTILE'
    AND min_index_interval = 128
    AND max_index_interval = 2048
    AND crc_check_chance = 1.0
    AND cdc = false;

Solution

  • For such things, CQL has "PER PARTITION LIMIT" clause (available in Cassandra 3.6+ IIRC). But to use on your table, you need to change table definition to CLUSTERING ORDER BY (datetime DESC), and then you could write:

    select * from prospacedb.quarter_utilisation per partition limit 1;
    

    and get row with latest timestamp for every partition key you have.