Search code examples
javacassandracassandra-3.0spark-cassandra-connector

Cassandra : How updates are handled if it happens before commitlog is flushed to Memtable


Write path for cassandra:

Step 1 - Data is written in commitlog immediately. Step 2 - After threshold is met, CommitLog is Flushed into Memtable Step 3 - Once threshhold for size is met in Memtable, Data is flushed into Disk as SSTable.

In above process, if Data is updated at the Step 1 itself, then does it require any special handling.

For example we have employee column family. Data starts to be written by a program into Column Family, after inserting 10 rows, An update is issued for 3rd row and Still data is not flushed into Memtable.

Will Cassandra handles this scenario as crash recovery ?

Please share views on this.


Solution

  • Here is the write path for Cassandra

    1. Data is appended to Commitlog, if durable_writes is set to true while creating the keyspace.
    2. Data is also written to Memtable immediately.
    3. Once thresholds are met for Memtable, data is flushed into disk as SSTable.

    During any time if an update occurs before flushing to disk, memtable captures that updated record as well as commitlog is appended with that update. Now if a crash were to happen; during restart of Cassandra all these records are replayed from commitlog and hence there is durability of writes/updates guaranteed.

    For additional reference