Search code examples

Cassandra Commit and Recovery on a Single Node

I am a newbie to Cassandra - I have been searching for information related to commits and crash recovery in Cassandra on a single node. And, hoping someone can clarify the details.

I am testing Cassandra - so, set it up on a single node. I am using stresstool on datastax to insert millions of rows. What happens if there is an electrical failure or system shutdown? Will all the data that was in Cassandra's memory get written to disk upon Cassandra restart (I guess commitlog acts as intermediary)? How long is this process?



  • Cassandra's commit log gives Cassandra durable writes. When you write to Cassandra, the write is appended to the commit log before the write is acknowledged to the client. This means every write that the client receives a successful response for is guaranteed to be written to the commit log. The write is also made to the current memtable, which will eventually be written to disk as an SSTable when large enough. This could be a long time after the write is made.

    However, the commit log is not immediately synced to disk for performance reasons. The default is periodic mode (set by the commitlog_sync param in cassandra.yaml) with a period of 10 seconds (set by commitlog_sync_period_in_ms in cassandra.yaml). This means the commit log is synced to disk every 10 seconds. With this behaviour you could lose up to 10 seconds of writes if the server loses power. If you had multiple nodes in your cluster and used a replication factor of greater than one you would need to lose power to multiple nodes within 10 seconds to lose any data.

    If this risk window isn't acceptable, you can use batch mode for the commit log. This mode won't acknowledge writes to the client until the commit log has been synced to disk. The time window is set by commitlog_sync_batch_window_in_ms, default is 50 ms. This will significantly increase your write latency and probably decrease the throughput as well so only use this if the cost of losing a few acknowledged writes is high. It is especially important to store your commit log on a separate drive when using this mode.

    In the event that your server loses power, on startup Cassandra replays the commit log to rebuild its memtable. This process will take seconds (possibly minutes) on very write heavy servers.

    If you want to ensure that the data in the memtables is written to disk you can run 'nodetool flush' (this operates per node). This will create a new SSTable and delete the commit logs referring to data in the memtables flushed.