Search code examples
cassandra

Cassandra Collection Size Limitation


As per this page

https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useCollections.html

Observe the following limitations of collections:

  1. Never insert more than 2 billion items in a collection, as only that number can be queried.
  2. The maximum number of keys for a map collection is 65,535.
  3. The maximum size of an item in a list or a map collection is 2GB.
  4. The maximum size of an item in a set collection is 65,535 bytes.

Based on the maximum limit suggested here (point 1 and 3 combined), if I go theoretically, then one can have a List of size 2 billion items in it where each item is of 2 GB size that would be 4000 petabytes of data. That will be again for one CQL row. This will be multiplied by number of CQL rows.

Am I interpreting it correctly ?

If yes, why is Cassandra allowing so ginormous size for a column and this does not sound right, even theoretically.


Solution

  • Your interpretation is partially correct.

    2 GB is the maximum size of a column value in Cassandra, be it a collection or other type. If a collection for a row has one item of 2GB max threshold, then you can't have more items in that collection for that row. Therefore it's theoretically possible to insert 2 billion items in a non-frozen collection, insofar that the total doesn't exceed 2 GB.

    Nonetheless, it's strongly discouraged to come close to either size or cardinality limit.

    Some guardrails were introduced in Cassandra 4 to control collection misuse:

    • collection_size_warn_threshold
    • collection_size_fail_threshold
    • items_per_collection_warn_threshold
    • items_per_collection_fail_threshold

    To establish a parallel, in Cassandra there is a WARN threshold for partitions over 100MB that can be triggered during compaction - compaction_large_partition_warning_threshold (default 100MB)

    Regardless of the recommended maximum size of 100MB for partitions, I've seen partitions with nearly 100GB. Like collection oversize, oversizing partitions is possible, but strongly discouraged due to performance impact.