I have a table defined like:
Keyspace:
CREATE KEYSPACE messages WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
Table:
CREATE TABLE messages.textmessages (
categoryid int,
date timestamp,
messageid timeuuid,
message text,
userid int,
PRIMARY KEY ((categoryid, date), messageid)
) WITH CLUSTERING ORDER BY (messageid ASC);
The goal is to have a wide row time-series storage such that categoryid
and date
(beginning of day) constitutes my partition key and the messageid
provides the clustering. This enables me to do queries like:
SELECT * FROM messages.textmessages WHERE categoryid=2 AND date='2019-05-14 00:00:00.000+0300' AND messageId > maxTimeuuid('2019-05-14 00:00:00.000+0300') AND messageId < minTimeuuid('2019-05-15 00:00:00.000+0300')
to get messages in a given day; it works so well so fast!
Problem
I need to be able to count the messages in a given day by substituting SELECT *
above with SELECT COUNT(*)
. This takes very long even with a little less than 100K entries in the column family; it actually times out on cqlsh
.
I have read and understood quite a bit why COUNT
is an expensive operation for a distributed database like Cassandra in Counting keys? Might as well be counting stars
Question
Why would this query take so long even when:
SELECT COUNT(*) FROM messages.textmessages WHERE categoryid=2 AND date='2019-05-14 00:00:00.000+0300' AND messageId > maxTimeuuid('2019-05-14 00:00:00.000+0300') AND messageId < minTimeuuid('2019-05-15 00:00:00.000+0300')
This is understandably caused by a common pitfall when the concept of 'everything-is-a-write' in Cassandra is overlooked and thence why tombstones happen.
When executing a scan, within or across a partition, we need to keep the tombstones seen in memory so we can return them to the coordinator, which will use them to make sure other replicas also know about the deleted rows. With workloads that generate a lot of tombstones, this can cause performance problems and even exhaust the server heap.
Thanks to @JimWartnick's suggestion on possible tombstone related latency; this was casued by overwhelming amount of tombstones generated by my inserts that had NULL
fields. I did not expect this to cause tombstones, neither did I expect tombstones to be a big deal in query performance; especially the COUNT
.
Solution
One common misconception is that tombstones only appear when the client issues DELETE statements to Cassandra. Some developers assume that it is safe to choose a way of operations which relies on Cassandra being completely tombstone free. In reality there are other many other things causing tombstones apart from issuing DELETE statements. Inserting null values, inserting collections and expiring data using TTL are common sources of tombstones.