Search code examples
cassandralimitcellbigdata

Cassandra Cell Number Limitation


is this 2 billion cells per partition limit still valid?

http://wiki.apache.org/cassandra/CassandraLimitations

Let's say you save 16 bytes on average per cell. Then you "just" can persist 16*2e9 bytes = 32 GB of data (plus column name) on one machine!? Or if you imagine a quadratic table you will be able to store 44721 rows with 44721 columns each!?

Doesn't really sound like Big Data.

Is this correct?

Thanks!

Malte


Solution

  • The 2 billion cell limit is still valid and you most likly want to remodel your data if you start seeing that many cells per partition.

    The maximum number of cells (rows x columns) in a single partition is 2 billion.

    A partition is defined by they partition key in CQL and will define where a particular piece of data will live. For example if I had two nodes with a fictional range of 0-100 and 100-200. Partition keys which hashed to between 0 and 100 would reside on the first node and those with hashed value of between 100 and 200 would reside on the second node. In reality Cassandra uses the Murmur3 algorithm to hash primary keys generating values between -2^63 and 2^63-1.

    The real limitation tends to be based on how many unique values you have for your partition key. If you don't have a good deal of uniqueness within a single column many users combine columns to generate more uniqueness(composite primary key).

    http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/create_table_r.html

    More info on hashing and how C* holds data.

    http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePartitionerAbout_c.html