Delete rows vs Delete Columns performance

I'm creating the datamodel for a timeseries application on Cassandra 2.1.3. We will be preserving X amount of data for each user of the system and I'm wondering what is the best approach to design for this requirement.

Option1:

Use a 'bucket' in the partition key, so data for X period goes into the same row. Something like this:

((id, bucket), timestamp) -> data

I can delete a single row at once at the expense of maintaining this bucket concept. It also limits the range I can query on timestamp, probably resulting in several queries.

Option2:

Store all the data in the same row. N deletes are per column.

(id, timestamp) -> data

Range queries are easy again. But what about performance after many column deletes?

Given that we plan to use TTL to let the data expire, which of the two models would deliver the best performance? Is the tombstone overhead of Option1 << Option2 or will there be a tombstone per column on both models anyway?

I'm trying to avoid to bury myself in the tombstone graveyard.

Solution

I think it will all depend on how much data you plan on having for the given partition key you end up choosing, what your TTL is and what queries you are making.

I typically lean towards option #1, especially if your TTL is the same for all writes. In addition if you are using LeveledCompactionStrategy or DataTieredCompactionStrategy, Cassandra will do a great job keeping data from the same partition in the same SSTable, which will greatly improve read performance.

If you use Option #2, data for the same partition could likely be spread across multiple levels (if using LCS) or just in general multiple sstables, which may cause you to read from a lot of SSTables, depending on the nature of your queries. There is also the issue of hotspotting, where you could overload particular cassandra nodes if you have a really wide partition.

The other benefit of #1 (which you allude to), is that you can easily delete the entire partition, which creates a single tombstone marker which is much cheaper. Also, if you are using the same TTL, data within that partition will expire pretty much at the same time.

I do agree that it is a bit of a pain to have to make multiple queries to read across multiple partitions as it pushes some complexity into the application end. You may also need to maintain a separate table to keep track of the buckets for the given id if they can not be determined implicitly.

As far as performance goes, do you see it as likely that you will need to read cross-partitions when your application makes queries? For example, if you have a query for 'the most recent 1000 records' and a partition typically is wider than that, you may only need to make 1 query for Option #1. However, if you want to have a query like 'give me all records', Option #2 may be better as otherwise you'll need to a make queries for each bucket.