Search code examples
amazon-web-servicescassandranosqlamazon-dynamodbdatabase-performance

Why multi-column key (composite in Cassandra's concept) is not supported in DynamoDB


I recently transitioned from Cassandra to DynamoDB, and found a difference between the two (pretty significant to me at least). The terms are a bit different, so for simplicity I will just call them partition key and clustering key.

In Cassandra we have this concept called composite key - the partition key can be a multi-column value, as well as the clustering key. However, it looks like there's no such concept in DynamoDB. The AWS document mentions composite, but it just means a primary key can be formed by <partition_key, clustering_key>:

Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.

I used the multi-column value (which is composite in Cassandra's sense) as keys a lot in the past so I was kind of shocked when I realized it's not supported in DynamoDB. I know it's always an option to do concatenation like this post. My questions are:

  1. Is having a multi-column value as the partition key an anti-pattern? Is this also true for clustering key?
  2. Would a multi-column key cause performance degradation?
  3. If there's no performance degradation, then what are other tradeoffs behind these two implementations?

Solution

  • If you look at how Cassandra implements composite partition keys, you'll see it simply serializes the multiple partition-key columns into a single key stored in the sstable (I wrote once a detailed explanation on this, in https://docs.scylladb.com/architecture/sstable/sstable2/sstable-data-file/ - for the open-source Scylla project which reimplements both Cassandra and DynamoDB).

    DynamoDB chose not to do this serialization for you, and ask you to do it yourself (this what you call concatenation). I don't think there was any particular reason why they chose to do this - I think it just simplifies the API.

    The only downside I can think of with not having composite keys is that you are not able to index parts of the composite key. In Cassandra, if (a,b) is a composite partition key, you can add a materialized view whose partition key is just just a (with b part of the clustering key). In DynamoDB you can't do that with a GSI (DynamoDB's parallel of Cassandra's materialized views).