Search code examples
cassandradatastaxcassandra-3.0datastax-java-driver

CQLSSTableWriter : do I need compaction after ingestion with sstablesloader?


I use CQLSSTableWriter to write corresponding SSTables of my data :

 writer.addRow(1, "test", ...);

The data is sorted by partition key, and clustering key, then I call addRow for each line of sorted data.

The data for a given partition is written in a single SSTables (or two at maximum).

Two questions :

  1. There is no compaction strategy needed with the CQLSSTableWriter builder(). Is that normal ?

  2. The already created table has a LCS compaction. But CQLSSTableWriter doesn't come with any strategy defined. So regarding that after ingestion the data never change (in my case !), and after I have ingested SSTables to Cassandra with sstablesloader, does it make sense that I prevent any compaction from running ? Or do I always need to run a compaction after every ingestion with sstablesloader ?

Thanks to make it a bit more clearer !


Solution

  • 1) Yes, the CQLSSTableWriter just creates sstables.

    2) When Cassandra gets the sstable from the sstableloader or nodetool refresh/import it will automatically do any necessary compactions. You don't have to and shouldn't do anything.

    If you really want you can disable compactions if you want

    ALTER TABLE keyspace.table WITH COMPACTION = {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false' }`
    

    Then it wont do anything and you can just ignore it and the sstables will stay as is.

    Having the partition in only 2 sstables does not necessarily mean that only 2 will be touched on the read. The bloom filters on the sstables will still provide false positives and if the number of sstables continue to climb it will eventually be an issue. If your clustering key is incrementing over time however that can be used to filter out unnecessary sstables as well as the min/max clustering key is kept in the metadata and checked in read path (this is how TWCS and most time series data prevents too much buildup). This also impacts repairs and misc operational tasks a lot as the sstable count grows.

    Ultimately unless its a problem I would seriously recommend just leaving the compaction as is, use SizeTiered if you think you are mostly good and it will just prevent things from going insane while doing the minimum of reads writes compared to others. If your CPU is maxed on compactions you have something else wrong you should check into as it should not consume that much (how do you know its compactions?), you can always throttle the compaction throughput as well.