Search code examples
javacassandratimestampdatastax-java-driver

When to use UUID instead of millisecond timestamp in Cassandra?


I have created table in cassandra, where primary key is some column with timeuuid as datatype. I am able to identify each record uniquely with just millisecond precision timestamp value stored as bigint.

I have used java datastax driver to connect cassandra. Before inserting record into database I am converting millisecond timestamp into UUID for each record. Which is overhead and can be removed.

  1. Can some one explain what are the benefits of using timeuuid instead of bigint considering records are able to identified without timeuuid's uniqueness ?
  2. Is there any performance impact in between timeuuid and bigint data type ?

Solution

  • There shouldn't be very big impact for performance if you generate timeuuid from timestamp. timeuuid is useful if you may have many events happening in the same millisecond, and you need sorting - with timeuuid you may get up to 10,000 different values inside the millisecond. Typical use case is the table with structure like this:

    create table tuuid (
      pk int,
      tuuid timeuuid, 
      ....
      ....,
      primary key (pk, tuiid));
    

    In this case, you will get sorting (ascending or descending) together with uniqueness of values for tuuid. Of course you can come with primary key of (pk, timestamp, random-value), but with timeuuid you don't need to have an additional column for uniqueness. One of the drawback of timeuuid is integration with Spark, for example, as it doesn't have this type, and may not able to perform pushing of the filters.

    If you don't need uniqueness, then just switch to timestamp - it's represented as 8-bytes long internally - the same as bigint, but you don't need to do conversions yourself, etc.