Search code examples
cassandranodetool

Cassandra - how to disable memtable flush


I'm running Cassandra with a very small dataset so that the data can exist on memtable only. Below are my configurations:

In jvm.options:

-Xms4G
-Xmx4G

In cassandra.yaml,

memtable_cleanup_threshold: 0.50
memtable_allocation_type: heap_buffers

As per the documentation in cassandra.yaml, the memtable_heap_space_in_mb and memtable_heap_space_in_mb will be set of 1/4 of heap size i.e. 1000MB

According to the documentation here (http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold), the memtable flush will trigger if the total size of memtabl(s) goes beyond (1000+1000)*0.50=1000MB.

Now if I perform several write requests which results in almost ~300MB of the data, memtable still gets flushed since I see sstables being created on file system (Data.db etc.) and I don't understand why.

Could anyone explain this behavior and point out if I'm missing something here?


Solution

  • Below is the response I got from Cassandra user group, copying it here in case someone else is looking for the similar info.

    After thinking about your scenario I believe your small SSTable size might be due to data compression. By default, all tables enable SSTable compression.

    Let go through your scenario. Let's say you have allocated 4GB to your Cassandra node. Your memtable_heap_space_in_mb and memtable_offheap_space_in_mb will roughly come to around 1GB. Since you have memtable_cleanup_threshold to .50 table cleanup will be triggered when total allocated memtable space exceeds 1/2GB. Note the cleanup threshold is .50 of 1GB and not a combination of heap and off heap space. This memtable allocation size is the total amount available for all tables on your node. This includes all system related keyspaces. The cleanup process will write the largest memtable to disk.

    For your case, I am assuming that you are on a single node with only one table with insert activity. I do not think the commit log will trigger a flush in this circumstance as by default the commit log has 8192 MB of space unless the commit log is placed on a very small disk.

    I am assuming your table on disk is smaller than 500MB because of compression. You can disable compression on your table and see if this helps get the desired size.

    I have written up a blog post explaining memtable flushing (http://abiasforaction.net/apache-cassandra-memtable-flush/)

    Let me know if you have any other question.

    I hope this helps.