This document lists a number of CQL limits for Cassandra 2.2. I'm particularly interested in the Collection limits for Set
and List
. If I've interpreted it correctly, the document states that values in Sets are limited to 65535 bytes.
This limit as far as I know exists because the set identity is implemented with a composite value in the column name of the storage engine's cell (similar to the clustering column value limit), which CQL restricts to that many bytes.
Consider a table, with a Set
like
CREATE TABLE test.bounds (
someid text,
someorder text,
words set<text>,
PRIMARY KEY (someid, someorder)
)
with
PreparedStatement ps = session.prepare("INSERT INTO test.bounds (someid, someorder, words) VALUES (?, ?, ?)");
BoundStatement bs = ps.bind("id", "order", ImmutableSet.of(StringUtils.repeat('a', 66000)));
session.execute(bs);
This will throw the expected exception
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: The sum of all clustering columns is too long (66024 > 65535)
Now if I change the table to use a List
instead of a Set
CREATE TABLE test.bounds (
someid text,
someorder text,
words list<text>,
PRIMARY KEY (someid, someorder)
)
and use
BoundStatement bs = ps.bind("id", "order", ImmutableList.of(StringUtils.repeat('a', 66000)));
I do not receive an exception. The document, however, states that List
value sizes are also limited to 65535 bytes. Is the document incorrect or am I misinterpreting?
I assumed List
values are implemented as simple column values in the underlying storage and the order is maintained through their timestamps.
The documentation here is wrong as far as I understand it. That limitation was changed in protocol version 3 (introduced in C* 2.1). From the native protocol specification under the changes section for protocol 3:
- The serialization format for collection has changed (both the collection size and the length of each argument is now 4 bytes long). See Section 6.
So as long as you use protocol version 3 or higher, you can create lists with as many as 2^31-1 bytes (2147483647) or elements.
Edit: I just noticed your comment about set identity, that may be a limitation of the storage engine itself, so perhaps the documentation was left this way for that reason, but the protocol itself supports larger collections now. Will pursue seeing if we can document that nuance.