Search code examples
apache-sparkcassandradatastaxcassandra-3.0

How to partition on cassandra table on timestamp?


I have table like this

CREATE TABLE ks.log_by_date (
        column_name text,
    status text,
    error_msg text,
    last_update_date date,
    last_update_timestamp timestamp,
    updated_user text,
    PRIMARY KEY (( column_name), last_update_date)
) WITH CLUSTERING ORDER BY ( last_update_date DESC );

INSERT INTO ks.log_by_date (column_name,last_update_date,error_msg,last_update_timestamp,status,updated_user) 
  VALUES ('column_log_by_date','2018-10-23','NONE',1540302120001,'ERROR','user1');

INSERT INTO ks.log_by_date (column_name,last_update_date,error_msg,last_update_timestamp,status,updated_user) 
  VALUES ('column_log_by_date','2018-10-23','NONE',1540302340001,'SUCCESS','user1');

When i insert two columns based on "last_update_timestamp" column with different timestamps 1540302120001 & 1540302340001 the row is overwritten.

What am i doing wrong here ? why it is overwrittening the rows instead of two separate rows. How can I make to insert two rows here... based on last_update_timestamp

Your help is highly appriciable.


Solution

  • If you want that 2 rows with different last_update_timestamp were counted as separate rows, then you need to put last_update_timestamp into primary key, as a clustering column:

    PRIMARY KEY (( column_name), last_update_date, last_update_timestamp)
    

    but I don't know - do you really need to have last_update_date there? Do you need to select all rows that have some specific last_update_date?